DOCUMENT RESUME 

ED 256 082 BC.170 4*9 : 

•> . • & * » ' 

AUTHOR Scruggs, Thomas E. 

TITLE The Administration and Interpretation of Standardized, 

Achievement Tasts with Learning Disabled and 
# " Behayiorally Disordered Elementary School Children. 
Final Report'. 

INSTITUTION. Utah Univ., Salt Lake City. 

SPONS AGENCY Special Education Programs (ED/OSERS), Washington, 

DC. 

PUB DATE f 2 Jul 84 

NOTE 172p.; Developed at the Developmental Center for the 

Handicapped. For the test taking skills training 
materials, see EC 170 490. 

PUB TYPE Reports - Research/Technical (143) 

EDRS PRICE MF01/PC07 Plus Postage. 

DESCRIPTOR^ Achievement Tests; Attention Control; *Behavior 
v * Disorders; Elementary Education; 'Learning 

Disabilities; *Student Attitudes; Test Anxiety; Test 

Coaching; *Test Hiseness 

ABSTRACT 

t Several- experiments were carried out to determine: 

(1) whether learning disabled (LD) and bahavi orally disordered (BD) 
students exhibit deficiencies with respect to appropriate test-taking 
strategies and (2) if so, whether these strategies could be " 
successfully trained. In the test-training evaluation, 92 LD or BD 
elementaryrage students Representing grades 2, 3, and 4 were randomly 
assigned to treatment or control conditions. Treatment subjects 
received eight training sessions on test-taking skillr, with 
particular regard to the Stanford Achievement Test. All treatment . 
students scored significantly higher on a test of test-taking skills. 
In addition, third and fourth grade LD and BD students scored « ' 
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* Abstract 

' Several experiments were carried out over the course of a L2-month 

to determine whether: "(a) learning disabled (LD) and 
behavioral ly disordered (BD) students exhibit deficiencies with 
respect to appropriate test-taking strategies, and,. If so, (b) v 
whether these strategies could be successfully trained. 
Preliminary investigation's Indicated that ml Idly handicapped ' 

students do exhibit deficiencies In the area of test-taking 
strategies. These deficiencies include -attention to- Inappropria te 
dittractors, failure to successfully employ prior knowledge* and 
deductftye reasoning strategies, ana; failure to identify correctly 
specific types of questions whtth call for different strategies. 
In the test-training evaluation, approximately 100 Lo and BO 
elementary- age students, representing grades.2, 3, and 4 were 
randomly assigned to treatment and control conditions. Treatment 
subjects received eight training sessions on test-taking skills 
ytith particular regard to the Stanford. Achievement Test. .All 
students scored significantly higher on a test of .test-taking 
skills.' In addition, third and fourth grade LD and BD students 
scored significantly higher on the Word Study Skills subtest and 
exhibited descriptive Increases over experimental group with 
respect to other subtests. Second "grade students were apparently 

unaffected by the training procedure. In addition, a similar 

■ * '•' * 

test-training package applied to intact third grade classrooms' of 



' • ' • / . ■ 2 * 

mostly nonhandl capped students Indicated that these matejMals were » 
successful In Improving student attitudes toward tfee test-taking v 1 
experience. I 
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• . PROJECT OVERVIEW J * • • 

The primary objective of tjiis project was to determine 
whether scores\on standardized achievement tests could be Improved 
through a combination of reinforcement, practice; and training of . 
"test-taking \sk111s M ; that'ls, those ski lis' which refer W 
understanding of the most efficient 'means to take a test, rather " 
than knowledge of the content area. Such training, If successful, " 

would likely Improve the validity of resulting test scores In thafe 

« 

a potential source of error, I.e., difficulty with format, testing 

s 

conditions, etc. would.be eliminated. In addition to the major - 
objectives, several smaller investigations were planned and 

carried out, the ultimate objective of which was to determine 

■ I - *« .If 

whether, -in- fact, students In special education placement 
exhibited specific deficiencies on select aspects of test-taking. 

In addition / another test-training investigation was carried out 
on Intact third grade classes in order to determine whether such a 
training package was appropriate to whole class administration and 
whether such training produced any change in on-task betravtorxor 
attitudes' toward, the test-taking experience. Approximately 15%, of 
this population was classified as learning disabled or 
behavioral ly disordered. % * • ' 

* * Preliminary Investigations 

In general, the project has proceeded In accordance with the 
planned schedule of activities in the proposal. However, when the 



proposal was prepared, It was assumed that materials development 

* * 

would not be necessary as materials had been developed . from a v 

prior project* and were 'at that time being, validated? Since this - 

i , ' . , • * 

project was funded, however', it "has been. determined that those 

>. • . 

materials as Implemented were not effective In Increasing the 

performance of students In regular education classes, on v . 

'. < 

standardized achievement tests. It was, therefore, thought 

.• ' . .. .» 

necessary to Initiate a series of %tudies to evaluate what • 
specific ski lis lower functioning students may. lack with respect 
to test taking,* and to develop a new set of materials which migh^t 
mor» specifically address these nee^ls. Accomplishments are 
described belt>w by each ta9k. * * • ' " 



1. . Assessment Of spontaneously employed test-taking - 

iiT' ' 

V 



strategies (July-December, 1983). A shorter version of the 
Stanford Achievement Test, Reading subtests, questionnaire form 



and follow-along sheet, were developed in order to evaluate the 
skills- students spontaneously employed In; test-taking. situations. 
These materials were utilized In-several studies to acquire this 
information. Students were selected from two remedial and one 
original program from each of, grades 1 through 7.. Students were 

r 

individually administered selected subtests of the Stanford 

• • i* 

Achievement Test. They were asked for their level of confidence, 
for each ahswer and the strategies they had chosen for answering 

« - 

the' questions* It was determined that a complete hierarchy of * 



strategies existed with respect to answering test questions beyond 

- ' • * . **■ 

simply knowing or not knowing the answer, and thatf these " * 

- strategies resulted In differential levels of performance on the^ 

part of the students. If was generally seen that younger students 

and academically lower functioning students tended to produce 

0 

. lower-level ^strategies than higher functioning and 'older students., 
TMs Investigation 13 described In detail irt'the manuscript In the 

appendix entitled, "An Analysis of Children's Strategy Use on ' 
'Reading Achievement Tests". Thjs manuscript has been accepted for 
publication by Elementary School JournaJ. . .Additional evaluation of 
the data from this investigation indicated the existence of a 
developmental trend through the elementary grades In the use of 
elimination strategies on ambiguous multiple choice Items. That Is 
as children got older, they became more proficient with, 
respect to their spontaneous* ability to eliminate Inappropriate or 
obviously incorrept alternatives. These results .have also been 
described in detail in. the man«spr|pt entitled, "Developmental 

K 

Aspects of ^Test-Wiseness for Absurd Options: Elementary School 
Children". This manuscript has been -submitted for publication. 

• * ' ' • 

A test of "passage independence 1 " of reading comprehension test 
items on the Stanford Achievement Test was developed by 
% administering, items from the Reading Comprehension subtest of the 
y SAT to college undergraduates.. The purpose of this investigation 
was to determine what proportion of these test items were 



potentially answerable by employing prior knowledge or , deductive 
reasoning ski lis. It was determined that college undergraduates 

were able to 'answer, nearly 80<cofHhese questions on the 'average; 

I . .** V'-'. '•; ' ' - • * - • ' 

with man# students .answering them all correctly. This article Is 

' . • * ' ' ' 

, glveq In the appendix under the title, "Passage Independence in 

Reading Achievement Tests; A FdUow- v Up," and has been published In 

, * • • • * . * 4 * 

> . • • • * 

rth'e* journal.. Perceptual. and Motor Skills . V . 

- v ' • . ' i ' 

• Twq .follow-up Investigations were Intended to examine- more * 

precisely the nature of ^test-staking strategies employed'by learning 

disabled studentsT'Speciflcally as compared with the strategies 

employed- by their non-disabled counterparts. In. one Investigation, 

LD'-aVid non-LD<stude*nts/were administered Items from the Stanford ^ 

Achievement Test, Reading Comprehension Subtest, with the .actual . 

•/ . • . " v 

readirrg passages deleted ftom the test. Studen^s'-were tpTd to 

^ % 
simply answer the question's ythe best that they could.. In the second 

experiment, ajl Items were read to both groups' of students i,n order 

.' . " * - ' 
'to control for general reading ability. In both experiments., 

v 

students^not classified as Teaming disabled scored significantly • 
higher on th1s\test of "passage independent" test Items than did 

. '* • , N • • i * * ' 

their learning disabled counterparts. Tnese results indicated (a) 
that learning disabled students' may di.ffer with respect , to 
spontaneous^ test-talcing strategies, such as use of prior knowledge 
and deductive reasoning skills, and (b)-raise 'the- issue of what such 

« 

test items are actually- measuring, ' since they could be so easily 



answered without having read the corresponding passage. This 

/ - 

x Investigation has been written .In manuscript form, and Is In the 
appendix under the title, "Are Learning Disabled Students Test-tfise: 

.\ • • » 

* ■» 

An Inquiry' Into Reading Comprehension Test Items";- and has been * 

' * * • 

submitted ^foc publication. * 

In a second Investigation, learning disabled and non-learning 

NjMsableaV'studentfcwere directly questioned with reject to 
; • * . 

strategies. they employed on reading comprehension test Items and 

*" 

letter sounds test items. In this investigation, IF was found that 
.learning disabled students did not differ from their non-disabled - 

* ' / » - 

. peers with. respect to answering recall comprehension questions, with 
ability to read controlled. However, learning, disabled students 

• • • 

'were less likely to employ appropriate strategies "to answer, 
inferential questions', and reported inappropriately high levels of 
confidence in their responses,.' In addition^ when they did report 
using appropriate strategies, they were much, less likely- to employ 
them successfully. This project Is described' In detail in the 



manuscript, "Spontaneously Employed Tes*t -Taking Sk i 11%. With Learning 

• * # 

Disabled Students on (Reading Achievement Tests." This manuscript 
has also been Submitted for publication and was presented at, the 
annual meeting of the American Educational Research Association in 
New Orleans in .April. 

* 

In an investigation which has pot yet been^ reported, it was' * 

• . 1 

determined that a sample of elementary-a^e behavlorally disordered 



students scored significantly lower (t(35). ■ 2.59,„£< '.01) than 
their nonh and 1 capped counterparts with respect to reported attitudes, 

» 4 • " . 

towards tests and the test-taking situation. These Investigations, 
taken .together, provided valuable Information regarding the most 
optimal training package to be developed for use with this 
population.* * ' / 

An evaluation of all major achievement tests Kas also made in 

order to determine whether tests were similar or different with 
respect to tormat demands on the test takers In this investigation, 
all levels of six ma^jor achievement tests were evaluated for number 
of format changes per minute throughout the reading achievement • test 
subtests It was determined that ^achievement tests varied widely 
with respect. to format demands, with most format changes occurring 
in .the primary grades. These results are documented in the 
manuscript, "Format. Changes in Reading Achievement Testy: . 

Implications for Learning Disabled Students," which can be found in 

* 

the appendix and has been submitted for publication. ' » 

• - — 1 ' 

In order to evaluate appropriately all previous attempts to 

train test-talcing! ski lis in the elementary grades, a. meta-analysis 

was Completed of all available studies In this area. It was 

determined that although the general effect of training was 

positive, differences in favor of training groups did not seem t6 • 

become substartial unless training was relatively expensive. In 

'addition, this meta-analysis revealed that low SES children and 



\ 
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prlmarygrade children were more likely to benefit from extended 
training hours. This seems to underline the Importance In the 
present project of Implementing a package of a higher level of * 
Intensity. The detailed results of this meta-analysis are given In 
the appendix under the title, "Improving Achievement Test Scores in 
the Elementary Grades by Coaching: J Meta-Analysis." This 
manuscript has also been submitted for publication. . 

Finally, during the first part of the project, the acope of the 
proposed research was described Jhd published by Exceptional 
-Children in the fall of last year and Is given In the appendix under 
the title, "Research In Progress: Improving the Test-Taking Skills 
of Learning Disabled and Behavioral 1y Disordered Elementary School 
Children." In addition, during the fall, preliminary findings were, 
reported*at the seventh annual conference of Severe Behavior 
Disorders of Children and Youth in Tempe, Arizona in a presentation 
entitled, "Training Behavioral ly Disordered Children to Take 4 
Tests." t , - ' f 

It was the intention of all of the above Investigation^ to 
evaluate both tests and test-taking strategies of mildly handicapped 

j 1 

students in order to determine the most likely strategies for 
intervention and the form that Intervention should take. In all, It 
was determined that mildly handicapped students do differ from 
their nonhandicapped peers with re'spect to use of appropriate 
strategies on standardized achievement tests. It was also 



10 

determined that these strategy deficits Included use of prior 
knowledge, use of deductive reasoning skills, attention to 
, appropriate dlstrabtofs, and selection of strategies appropriate to 
correctly answering different types of i±t«a. . 

2.^ Development and revision of training materials (September- 
February, 1983 T 1984). pfeed upon results of the above Investigation 
and careful .evaluation of the Stanford Achievement Test, materials 

were developed which were Intended to teach to second, third, and 
fourth grade children In special education placements skills 
appropriate to the successful taking of the Stanford Achievement 

v 

Test. These materials Included eight scripted lessons and a student 
workbook of exercises on subtests meant to be very similar to those 
used on the Stanford Achievement Test. These materials were 
Intended to teach both general test-taking strategies, such as 
efficient time usage, as well as specific lessons meant to Increase 
understanding of the particular test demands of the Individual 
reading subtest of the Stanford Achievement Test. These materials 
are Included with this report and are entitled "Sdper Score." 

Following the preliminary development of materials, they were 
"'pilot tested In November on two groups of second grade children with 
learning and behavioral disorders. On the basis of this pilot 
investigation, several revisions were made In the materials. 
Specifically, some of the lessons proved to be too long for the most 
effective implementation with this project, and some Instructions 



* , 

i 



11 

were judged to be ambiguous. In addition, a pre and posttest 
measure which was developed for use with this population was also 
judged to be Inadequate to effectively assess progress made on these 
materials. 

On the basis of the initial pilot Investigation, the materials 
were revised and expanded to Include second to fourth grades, and 
were then Implemented In a larger/field test 1nvolv1ng\4 students - 

In special education placements In second, third, and fourth grades. 
Students were randomly assigned to treatment and control groups at 
each of the three grade levels, and the lessons were administered to 
the treatment groups. Students in the experimental group were seen 
to score higher than students In the control grouj) on* a shortened 
version of the StanfoH Achievement Test,' Reading Subtest. . 

These findings were not conclusive due to the small number of 
subjects employed In each grade, and the fact that different forms 
- of the Stanford Achievement Test appeared to have been 
differentially difficult for different grade levels, the result • 
being a differential level of difficulty on the posttest measure. 
Although statistical significance was not found, 1t was determined 
that student^ in the experimental group had scored .48 standard 
deviation units higher than students In the control group on the , 
heading Achievement Word Study Skills and Reading Comprehension 

/ 

subtests. This effect size, had it been a reliable one and occurred 
in the primary grades on an actual test administration, would p have 

V 
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been equivalent to a four- or five-month gain /score for students who 
• had received the training. In addition, analysis of pre- and 
posttests of test-taking ski lis* Indicated- that the materials had Irr 
fact been effective In training these particular skills. 

Some final revisions were made of the training materials on the 
basis of the second field test, and materials were finally prepared 
for spring Implementation immediately prior to district-wide 

standardized test admin 1s tfation. While final revisions were being 
made, Individual schools were contacted to be Involved In a larger 
experimental study intended to validate/these materials. For this 
study, approximately HO^tudents enrolled In special education 
classes, in' grades^, .3, and 4 in two different large elementary 
schools were selected and randomly assigned to treatment and control 
„ conditions. Four persons, including the principal investigator, 
tobkjpart in .the twj-week training period which was administered at 
the end of March. This training was administered in eight 20- to 
30-mlnute sessions given from Monday to Thursday for each of two 
weeks 1mmediate1y%r1or to district-wide test administration. At 
the same time, materials were developed intended to Increase test- 
taking skills on the Comprehensive* Test of Basic Skills and were 
administered In the school districts adjacent to Utah State 
University. This training package was implemented In local third 
grade cl asses ^jj order to determine (a) whether these procedures 
were |jppropr late for whole-class administration, (b) whether the 
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materials developed^ for the Stanford Achievement Test could be 
easily adapted to other tests| and (c) whether such training could 
be seen to have an Impact upon test scores, attitudes, and time on- 
task during test admin (strattan. 

The results of the trailing on. the Comprehensive Test of Basic 
Skills In the local third grade classes Indicated that student*', 
attitudes had, In fact, qualitatively Improved as a result of the 
test training. It was suggested that the test training had resulted 
In a more norma Cdistrlbutl on of attitudes after the end of the 

A 

three days of testing. and implied £nat the training had made the 
test-taking experience Itself less traumatic* on the part of third 
grade regular classroom.students (Including 15% ml Idly handicapped 
students). Time on-task during directions and during the test- 
taking experience Itself did not seem tope affected by the training 
package. In addition, the training was seen to significantly 
Increase phe scores of students In the 1oi*er half of the class on - 
Xfie Word Attack subtest of the reading test. Analysis of the top 
naif, or the group as a whole, was not "possible due to the presence 
of .strong celling effects In both experimental and control groups. 
This investigation has been written In manuscript form and "is given 

r 

in the appendix under the title, "The Effects of Training on the 
Standardized Test Performance, On-Task Behavior, and Attitudes of 
Third Grade Children." This manuscript has been submitted for 
publication. 
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* Results or the training package with second, third, and fourth 
'special education students also Indicated* that the training was 
successful In Improving scores on standardized achievement Tests. 
Although only descriptive differences were seen In s"<ne subtests, .• .j\ 
the training package significantly 1mpr6ved the performance of the 
experimental students over control students In the Word Study Skills 
subtest. This Improvement *was judged^ to be approximately equivalent 

to a three- to four-month Increase in' equivalent grade level: The 
fact that Improvement in the Word Study Skills subtest was observed 

was considered to be due to the fact that this particular subtest 

_ " • 
Involved many smaller subtests; several format changes; and 

%^ 

potentially confusing directions for which the training package was 
thought to hate been particularly helpful. Descriptive differences 
were seen in other subtest^ of the SAT but, not being statisticaTTy 
significant, it' is not possible to determine whether they were 1 a 

\ 

\ 

- result of the training or simply sampling error. Evaluation of 
scores of the second grade students Indicated that they apparently 

« 

had not benefited from the training package. However, the 
.differentially small number oT subjects in the second grade sample, 
attrition suffered during the training, and the fact that these two 
groups were it retrospect found to have differed with respect to the 
previous year's testing, obscure clear Interpretation of this data. 
It may be, for example, that second grade LD and BD students have 
insufficient reading and other academic skills to enable them to 
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/ » 
benefit from this training package, or It could be that these 

/ • ■ • 

students had In fac? benefited but that due to sampling and 

attrition problems, these benefits* w^re not observed. This entire 

Investigation Has/been described In detail and 1s given In the . 

/ • . 

appendix under, the title, "Training Test-Taking Skills to Learning 
Disabled and BehavioralYy Disordered Students/ which has been 
submitted for /publication. „ 

Conclusions 

The majpr findings of the year's research»suggest that: (a) 
mildly handicapped students differ from their nonhandl capped peers 
with respect to spontaneously employed test-taking strategies and 
attitudes toward the test-taking situation, and (b) that these test- . ♦ 
taking skills and attitudes can be slgnficantly Improved by 
training. These findings Indicate that for children classified as 
learning disabled or behavioral ly disordered, achievement test 
scores often may' not be as accurate a measure of j$tual academic 
performance as is possiblfe. It also seems to indicate tfrat training 
to increase test-taking skills and attitudes towards tests may 
significantly increasa the Individual handicapped student's 
functioning on these tests. 

A case can be made that norm-referenced tests are not solely 
relied upon in making placement decisions, and that in fact other 
individually administered tests are better indicators of specific 
skill deficits with teaching implications. It Is true, however, 



; 16 ■ 

that these .students deserve to 'be taught basic skills that they may 
lack In any particular area, Including taking standardized, group 
administered achievement tests, and that If their poor performance 
can be Imprcved at all, this seems' to indicate that substantial • 
error Kas been reduced from tfie tests. Any such Improvement then Is 
judged by the present project personnel to be worthwhile. , 

Several questions, however, remain to be Investigated by the 
present project. F1i*i, whether ,or not this type\^tra1n1ng' is 
likely to result in Increased scores on math subtests is completely 

t . * 

unknown -and, in fact, cannot be determined on the basis of the 
present investigation. In addition, the Extent- to which secondary- 
age learning disabled and b/havlorally disordered children are 
deficient in. test-taking skills and attitudes and to what extent 
these may be trainable also cannot be concludes n the present 
investigation. It Is the purpose of the project during the second 
year to Investigate test-taking deficiencies on math Subtests and 
corresponding" potential for training, and 4he third year, to 
evaluate test-taking characteristics of secondary-level learning 
disabled and behaviotally disordered students. It . Is the hope of % 
this project that by the third year of funding, general conclusions 
can be made with respect to Mildly handicapped learners of all ages 
and several different' types of achievement tests. It Is hoped that 

4 

this information will be of benefit to many specifl educators, 'and 

» 

partlcularlj^tudents In Special education classes, throughout the 
country. " ' 
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RESEARCH IN-PROGRESS — 



Department Editor 



Improving the Test-Taking Skills of 
LD and BD Elementary Students 



Principal Invtutlgatom Cla Taylor and Thomaa 
'ScAju*. Exceptional Child Canlar. Utah State Unl* 
varsity. 

^ufjuse/Ob/ecrives: Tht pumpse of this investiga- 
ttofcris to determine whether reinforcement Itch- * 
niquti and direct training In test-taking ttille can 
increase the validity of test scores for learning dls- 
ablad (LD) and behwlorally disordered (BD) stu- 
dent*, to determine the degree to which LD end BD 
students exhibit Inappropriate (Inefficient) test-tak- 
ing skills, students ere observed end Interviewed 
while taking standardized Jests. Based on those 
obiervational dati, procedures end trelnlng pack- 
ages will be designed to ^ncreese student perform- 
ance on standardized achievement rests. If the proce- 
dures and training are effective, educational deci- 
sions, which are frequently based in parf on the 
results of stendardized achievement tests, will be 
"more valid because problems In areas auch es test- 
taking skills, student motivation, end confusion due 

to testing format will bo roclficed or eliminated. 

• • «. * t 

Subject*; Suhjisct* urn loo niiunimtary students en- 
rolled In 12 rniourcf rooms end self-contained class- 
rooms toP'childr.en with learning disabilities atid 
^ behavioral disorders* 

Methods; LD and BD children matched on age, 
handicap, and standardized achievement* test scare 
will be randomly assigned to-experimental a/id con- 
trol groups. Students In the experimental group will 
receive materials and procedures designed to Im- 
prove the ability of handicapped students to take 
tests. Experimental and control groups will bo com- 
pared statistically on several measurer* including 
attitudes toward test-taking. 'student and teacheh 
behavior during test administration, and actual per- 

/ 



formance on standardized tests of reading achieve* 
ment. In following years, materials wjll be devel- 
oped end Implemented for mathematics achieve* 
ment teste end test-taking skills for secondary-age 
handicapped students. 
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Jtesvlfs to Dole; Preliminary findings Indicate that 
many LD end BD children, as well as low achieving 
nonhandlcappfd students, do not spontaneously ax- , 
hiblt efficient teet-taking behaviors. Specifically, 
handicapped chlldre^ have been seen to exhibit 
difficulties with Item format and distrainors more 
typical of naive test takers. ^ * 

bemmencemenf and Estimated Completion 
Doles; This Investigation began July 1, 1003 and is 
expected to continue for thrno yciirx. 

Funding: Funding for this InVestigtftion hat been 
provided by a grent from Wye U.S. Department of 
Education, Research in Education of ihe, Handi- 
capped. 

' V 

PublkatlansJProduets Available: Preliminary ma- 
Jerlela for lift proving test-tiklng skills, piloted on 
nonhendlcapped second-grade students, have been 
developed and will be revised for use with handi- 
capped children during the^coming ye«it. Manu- 
scripts documenting tho jiivoatigntion will bo com- 
pleted and submitted for publication during the 
second helf of the atademic ydar. Please write the 
outhorsior fuHher tafpnyiation. 



"Research 'in Progress" is n forum for reporting 
ongoing reseorch in the field of special education 
that 'has not yet been published. Investigators 
wishinglfcreport studies in progtctt ore invited 
ta submit o brie/ synopsis of their efforts to the 
column editor, ClWries C. Ciciand, 3427 Monte 
Vista. Austin TX 78731. Heports are to be submit- 
ted in triplicate and should follow the format 
shown above* with a maximum length of '500 
words. 
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An Analysis'^ Children's Strategy Use on 
Reading Achievement Tests 

j 

' v 

0 

» 

Much of what constitutes read Imp-fas traction in today's public* schools 
reflects students' scores on standardized achievement tests.. Test' 

t 

performance may Influ nee later assignment Into reading. groups or * 
classrooms, or remedial or special education programs. Although norm* 
referenced reading tests have been criticized as being Insensitive to- 
specific skill deficits and Inadequate as complete 4d1 agnostic measures 
(Howell, *1979), most reading tests have nonetheless been seen to*be highly 
reliable and valid (Spache, 1976).^ For better, or worse, standardized, 

* • * 

* • 

reading tests are very much a part of education today and wlllmost likely 
continue to be used In the future. 

If Important decisions are to be based on the results of standardized 
reading tests, student scores should provide- the best possible estimate of 
reading performance. ""Unfortunately, the results of past research have 
indicated that student reading test performance can be influenced by factors 
othe^than knowledge of test content (e.g., Taylor & White, 1982). One of 
these factors, test-wiseness (TW), was first described In detail |n 1965 by^ 
Millman, Bishop, and. EbeJ/ as "a subject's capacity to utilize the ' , 
•characteristics and formats of the test and/or the test-taking* situation to 

receive a high score" (p. 707). Millman etal. developed an outline of 

f ' 
test-wiseness principles, which Included time using strategies, erro> 

avoidance strategies, guessing strategies, and deductive reasoning 

strategies. Slackter, Koehl^r, and Hampton (1975) presented information 
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which suggests that TW has a developmental component. That. Is, students may 
become more "test-wise" as they. grow older. Generally, researchers have. 
Inferred extent of TW on the basis of tests specifically constructed for * 

•\ ' » . * 

this purpose,. . m ■ * ' 

* - * '* ' 

Recently, students themselves were questioned about strategies they use 

to answer test questions. Haney and Scott (1980) administered a -number ef 

achlevelnent te,sts to 11 students, j then questioned each -.student the following 

day concerning the manner In which they attempted to answer each Item. 

These, researchers* developed a complex model wi 4 thJwhich responses to 

interviewer questions Were classified Into 46 separate categories.' Most of 

these Categories 7 included the use of -some specific strategies such as 

guessing,- elimination of alternatives, or "reasoning." Their results 

indicated th^t children use a wide range of strategies in answering test * 



questions and that often the child's perception of item content bears little 

• * • • ■ t 

resemblance to the intention of the author of t hp. test. Haney and Scott \ 

concluded that considerable "ambiguity" exists in standardized test 

questions and that it exists to a> greater extent in science and social 

studies areas, and -to' a lesser^xtent in reading areas: , 

d . t, „" : • 

The Work of Haney and Scott contributed significantly to our knowledge 
of the nature of^ ambiguous test items. The focus of their study,' however,- 
was on test construction, with implications concerning the reduction of test 
item ambiguity. Although classroom teachers may use the results of Haney 
and Scott to improve their own tests, published standardized .tests. cannot be 
altered by teachers. A question which remains concerns the extent to which •* 
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1 . 3 . 

students employ "test-taking" strategies when faced with difficult or , 
c amb1guous Hems. Do students spontaneously use, such strategies (that is, 

» 

without, being trained)? If so, which strategies (if any) are effective in 

/ l ' ' % A ' 

obtaining correct .answers? Nq previous research has been located to answer 
these questions. , ' N 

To address those questions 1n the present study, the reading test 
performance ptf elementary school children was examined. Specifically, two 
areas wereNnvestigated: (a) the. strategies spontaneously employed by 
students to answer reading test Items, and (b) the relative effectiveness of 
these strategies i« increasing reading test scores. 
> ! , Procedure . 

A. sample reading test based u^n items from the Stanford Achievement 
Test t SAT) was developed and piloted on five students to evaluate whether 
Ihe length was appropriate and to establish re Habl encoring conventions. 
This sample^test Included i.tems from the Word Reading, Reading 
Comprehension', Word Study Skills^ and Vocabulary subtests. After revisions 

• had been made, 'it was administered to 31„elementary age Caucasian students 

. . . • • ."•> 

(15 girls, J6 boys) attending summer classes in a western rural area. 
Students were selected from both- remedial and "enrichment" classes so that a 
range of abilities was* represented. Twenty students were seen to read at or 
above grade level; 11 were seen to read below their grade level as assessed 
'by the Woodcock Reading Achievement Test. Most»students (20) were second or 
-third graders, but students were also selected frojn grades 1 (2), 4 (2), 5 • 
(5), and 6 (2). 
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Al.l students were seen Individually by one of four examiners. One 
examiner Interviewed 18 students, while the other three Interviewed 4, 6, 
and 2 students. First, students were given the Woodcock Reading/ Achievement 
Test, Passage Comprehension subtest, in order to Identify an approximate 
reading comprehension grade equivalent. Students were then given selections 
from the SAT taken from the level one year higher than their assessed grade 
level on the Woodcock subtest. In this manner, a similar difficulty level 
was provided%or each student. Most students were able to answer correctly 
approximately two-thirds of the test questions. 

Students were then told to read aloud each test question (as well as 
the reading passages in the reading comprehension subtest), and to read 
aloud whichever of the distractors they chose to read. They were neither . a 
encouraged nor discouraged from reading each distractor. As soon as ^ 
students had answered a test question, they were asked to rate their level 
of confidence in their response: were they very sure, somewhat sure, or not 
sure the answer they had given was correct? After students had finished 
each subtest, they were asked to re-read the questions end tell the examiner 
why they had chosen the answer they did. The examiner recorded reading 
errors, confidence levels, attention to distractors, reference to reading 
passage, and reported strategies. Sessions were tape recorded to clarify 
any later ambiguity in scoring. Students spent 45-90 minutes in the session 
and answered 31-42 test questions. Some students received more questions 
than others because different levels of the SAT required different subtests . 
and formats. 
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Results <l 

Effectiveness of Strategies „ . * \ 

We found that all strategy responses could be classified within a 10- 
level hierarchy which strongly predicted probability of correct responding. 

Proportion of correct responses' wer>e computed- across subjects for each type 

" » 

of strategy* used«and are shown in Figure- 1. These classifications were as 



Insert Figure 1 about here 

i 

as follows: (a) skipped (student Skipped the item), (b.) misread a key word . 
in question or* dis tractors, (c) used faulty reasoning (example: *one 
student reported, "this word must be the correct answer because it has a 
period after it"), (d) didn't follow directions, (e) guessed, (f) "seemed 
right" (student thought the answer was correct without being able to state 
an explicit*. reason), (g) used external information (example: "I know most 
people in fires die from breathing smoke because a fireman told me that"), 
(h) eliminated inappropriate alternatives, (1) referred to passage, and (j) 
clearly "knew" the answer (example: "I knew that a pear is a kind of 
fruit"). The existence of these strategies indicated that a complete 
hierarchy of test-taking skills exists beyond simply knowing or not knowing 
the answers, and these strategies can be more or less effective on a 
standardized reading test. As seen in Figure 1, for example, when students 
skipped an answer, they got none correct; when they guessed, they got Z7% 
correct; when they eliminated alternatives, they got 6.7% correct. 

Proportions of strategies employed are given in Table 1. 

\ 
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*- 

t I 

We collapsed these strategies Into five logical categories (skipping, 

1 »• , 

< y • 

procedural error, guessing strategy, deliberate strategy, and "knowing") and 
computed point blserlal correlations for each subject. The median 
correlation between«*H«m score and reported strategy was .54 i£ < .01), a 
correlation of moderate strength which Indicated that over 30% of the 
variance 1n test performance was held 1n common with the level of test- 
taking strategy employed.* No differential effects we.e seer; by age, ability 
level, or examiner, although the sample was^too smal l to conclusively 
Investigate these possibl 1 itfes. 

An inspection of Figure 1 reveals some other Interesting findings. 
Notable is the high proportion of correct scores for guessing. Since number 
of answer choices varied between subtests and levels, with four choices the 
most common format, probability of correct responding by chance alone was 
estimated at .28. In fact, when students, reported guessing, they scored 37% 
correct. That "guessing 1 ' responses scored virtually the same as "seemed 
right" responses suggests that even when students believe they are guessing, 
they still have some idea of what the correct* answer might be and can use 
this strategy to advantage. "Seemed right" responses were common on the 
vocabulary subtests in which students often reported that ^ particular 
definition sounded correct, but were otherwise uncertain. Another 
interesting finding is the high proportion of correct responses when the 
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student reported using outside Information or experience. Although content 
area tests such as science and social studies directly test outside 
knowledge, reading tests ostensibly are Intended to test nothing other than 
knowledge of the content provided In the passage. So, although use of out- 
side Information should not help, In fact, students benefited from the use 
•of such Information. (It should be noted, however, that when students 
referred to the passage, they scored e\en higher.) What 1s surprising Is 
that students were 'able to use outside Information as effectively as they 
did. This finding underlines the problems in "passage Independence" of 
reading comprehensio*R*>1 terns so well Investigated by researchers such as 
Tuinman (1973-1974). ' •<,' 

Level of Confidence as a Variable • <■ 

Students had a reasonably good idea of whether they had answered a test 

4 * 

question correctly or not. When students reported being "very sure" their 
answer was correct, they were in fact correct 81% of the time. When tbey 

reported being "somewhat sure," they were correct only 13% of the time, and 

i • - 

when they reported being "not sure", they obtained correct answers in only 
7% of the cases. These figures are somewhat misleading however. If looked 
at another way, the results seem different: when students answered 
incorrectly, they also reported being "very sure" the answer. was correct in 

56% of the cases. Clearly. U»oPof confidence in itself, although related 

.1 < ., 

to performance, is not a sufficient check on correctness of a student's, 
work. The relation between confluence to correctness of response was seen 1 
to vary widely froir. student "to student, with a median point biserial 

» 

correlation of .29 (p_ > .05). In many cases, then, other means. are 
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necessary for students to assess the correctness of their responses. These 
means will be described below. 
The Cost of Carelessness 

In addition to reported test-taking strategies, Information was also 
collected on the degree to which the students attended to distractors and 
referred to- the reading passage on the reading comprehension subtest In 
choosing their answer. Interestingly, students referred to the reading 
passage only .very rarely, even though when they d1| refer, they stood a very 
good chance of answering the question correctly. It was found that when 
students answered a reading comprehension question incorrectly, in 89% of 
the cases students had not referred back to the passage which clearly 
contained the correct answer. This, of 'course, does not mean that all of 
these questions could have been answered correctly, but it does appear that 
reading scores ujuld be mudh Improved by students' increased attention to 

V / 

the passage. 

Similarly, ,a great deal of carelessness was observed in attention to 

*. 

all distractors. When students answered incorrectly, in 40% of the 302 
cases they had not read all distractors. Again, this finding does not mean 
all these questions could have been answered correctly by greater attention 
to distractors, but the score "could almost certainly have been Improved by 
so doing. When students answered questions correctly, they had attended to 
all distractors in 73% of the 577 cases. It does appear, then, that test 
performance can be improved through greater attention to distractors. 

Another surprising finding was the relatively small effect of reading 
errors. Although performance was clearly irtpaired when students (nlsread a 
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word of key importance (see Figure 1), misreading words In general had less - 
detrimental effect than might be expected. When one or more words In stem 
or distractor were misread, the proportion of Items answered correctly (58< 

f 

of 293) was still quite high. Clearly, many* students have developed 
strategies for coping with words they cannot read. It seems Important, then 
that students be reminded not to "give up" If they cannot read every word. 
As seen In the present Investigation, students ,are often able to answer 
correctly even though they were not able to read every word. 

1? 

One final finding concerning carelessness can be reported. All * 
examiners noted the extent to which students had attended to the wrong 
stimulus 1n the "word study skills" subtest. In this subtest, students are 
given a word with an underlined sound, and asked to find the same sound In 
one of three dlstractors. For example, In the^ following problem: 
; Prize • ' 

(a) , prince . ' 

(b) size 

(c) seven 

the correct answer is (b) since the "z"'in "size" has the same sound as the 
underlined M z" In prize. What was surprising to the present Investigators 
1s the fact that stH*dents so often attended to the wrong stimulus, for 
example, thf initial "pr'* in the above question. Although exact incidence 
of these errors cannot be given, their consistent occurrence, seems to Imply 
thafl teachers should stress the Importance of attending to the underlined 
sound only. * 
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Conclusions 

— — ^ 

The results of this study have demonstrated that students do employ 

specific strategies to cope with test item ambiguity and with Indecision or 

lack of knowledge- In selecting correct answers. Important Implications can 

be drawn from these findings which have a direct bearing on student 

performance during testing. To attain the most correct answers, students 

» • 

should employ the 0 strategies listed below: 

1. Never skip an answer. * • 

* • 

2. Be certain to attend to all dlstractprs and refer to the reading 
passage, even 1f you are "very sure" your answer Is correct. 

3. If you are bavlng great dtff Icu-lty reading a passage, read the 
questions W try to answer them" anyway. Often, your own knowledge 
can help you choose an answer. If you hj^e difficulty with- some ; 
words In the question* or dlstractors., answer anyway and base your 
answers on the words you can read. 

4. If you have attended to alt parts of a passage* and test question 
and still do not know an answer, there Is still a good chance of 
getting the correct answer .if you guess^. 

5. Be certain you are attending tq the appropriate stimulus, such as 

« 

the underlined sound In a "word study skills" subtest. As In other 

I 

subtests, wrong answer choice's are given which may look correct at 
ff»rst glance. 

6. Make sure you answer every Item* even If you must hurry and guess a 
lot near the end. You will probably get some of the answers 
correct. ' " 

31 
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Given the results of past research (Bangert, Kulik, & Kulik, 1983), It 
1s .likely that to significantly affect test performance, a teacher will have 
to do more than simply read the above points to students.' Examples and 

» m 

practice actlvltes will help develop these "test-taking" skills.. 

These findings are of Interest to special education, particularly the 
area of learning disabilities. Many children are referred for special class 
placement on the basis of deficiencies seen In standardized reading tests. 
Special education Is often quite beneficial to students who clearly need It, 
but before taking such a dramatic step,. 1t should be known for certain that 
the student's score reflects the best abilities of the student, rather than 

a problem with test-taking -in general. 

. • ■ ■> 

Overall, the present Investigation Indicated that a range of abilities 

9 

exists in test-taking skills, as It does inoother areas. The specific 
skills observed In efficient students taking a reading test should be 
practiced by all students, If tests are to be as valid as possible. If test 
taking skills are incorporated In general test administration procedures, It 
appears that maximum benefit can be derived from the use of standardized 
reading tests. / 
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84322. ' 

2 ' * V 

c A.po1nt-b1ser1al, rather than a Spearman correlation of ranks 



coefficient, was computed out of concern for the necessarily high ' ; 
number of ties resulting in computing a rank correlation with binary, 
data. The obtained 'Spearman coefficient, .55, however, differed by 
only one point from the obtained point blserial of .54. 




s. 



34 



r 



Children's Strategy Use 

, . 14: 



Table 1 

Frequencies and Percent of Strategies Employed 

• * * « 





« 

Strategy level 


Frequency 


Percent 


0. 


Skipped 1 ttem 


9 


•* 

1.0 


1. 


Misread Keyword 


23 


~>* 2.6 


2. 


Faulty Reasoning 


* 38 


4.3 


•3. 


* • 

Did Not Follow Directions 


. 7 


0.8 


4. ' 


"Seemed Right* 


92 


10.5 


5. 


General 


127 




6. 


Used External Evidence 


21 


z.r 


7. 


Eliminated 


45 • 


5.1 


8. 


Referred To Passage 


59- 


6.7 


9. 


Clearly "Knew" 


458 


52.1 



• 
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Figure 1. "Proportion correct by- strategy gsed. 
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♦ Strategy Classifications: 

0. ' Skipp cf Item 

1. Misread Keyword 
Faulty Reasoning 
Didn't Follow Directions 
"Seemed Right" 
Guessed 

Used External Evidence 
Eliminated 

8. Referred to Passage- * 

9. Clearly "Knew" 



2, 
3, 
4, 
5. 
6. 
7. 
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Abstract 

Twenty-eight students from, grades 1 through 5 were administered a test of 
tes,t-w1seness for absurd options. Results suggested that a developmental 
trend may exist 1n test-wlseness for 'elementary- age school children. 

Jf 
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Developmental Aspects of Test-W1seness for Absurd 

\ • 

0pt1ons\ Elementary School Children • 

First discussed by Thorndl\e 1n 1951, test-wlseness (TW) was described 
1n detail by Mlllman, Bishop, and Ebel (1965), and defined as ™a subject's 
capacity to utilize the characteristics and formats of the test and/or test- 
taking situation to receive a high score" (p. 707). They further described 

o 

TW as "logically Independent of the examinee's knowledge of the .sub ject 

t 

matter for which the Items- are supposedly measures" (Mlllman et al., 1965, * 
p. 707). Ebel (1965) has suggested that error In measurement 1s more likely 
to be obtained from students low in test-taking skills. The student low In 
TW, therefore, may be more of a measurement problem thanUhe student high in 
TW (Slakter, Koehler, & Hampton, 1970b). 

Some Investigations have Indicated that TW has a developmental 
component; that 1s, that TW Increases with age. Slakter, Koehler, and 
Hampton (1970a) administered a measure of TW to students from grades 5-11 
and found a significant overall linear trend for grade level. Crehan, 
Koehler, and Slakter (1974) administered a TW test to students 1n grades 7 
through 11, and a follow-up test to the same students two years later. 
Increases over all Intervals except grades 9 to 11 were found. Vt\ a second 
follow-up of the same students, Crehan, Gross, Koehler, and Slakter (1978) 

i 

0 

replicated the previous findings and concluded that although TW increases by 
grade, la*ge Individual differences exist within grade levels. 

» Although the above Investigations provide strong support for a 
developmental component of TW 1n the secondary grades, as yet no 
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invest igaticii has evaluated the developmental nature of Tw in the elementary 
* grades. Tto* present Investigation is intended to address this question. 
* * . Method 

Subjects were 28 elementary school -age children attenoSig* summer 
classes prior to entering grades 1 through 5 in a western rural community. 
Students (1 first grader, 9 second rectors, 11 third graders, 2 fourth 
graders, and 4 fifth graders) were selected from both remedial and 
"enrichment" classes so that a variety of ability levels was sampled. 

/ Students were seen individually by one of four examiners. First, they 

/ 

/ ■ werd administered a five-item test of TW. This test was develops to 
measure the ability of students to eliminate options W>wn to be incorrect 
(corresponding to the Millman et al., 1965 TW category I-D-l, absurd 
options). For example, one of the items was the following: 

a? • Good airplane pilots must be able to 

** quickly in an emergency. 

c 

J. fall asleep- 3, sturnate 

fit 

2. scream 1 f. thing 

Students were orally provided with words they were unable to read. Since it 
was thought that evidence of TW would be more subtle in an elementary school 
population than.it v:3s in studies of secondary stu.'r.its, some departures 
were made from the procedures of Crehan et al. (.974.'. First, students were 
directly questioned regarding the reasons for th fir answer choices following 
completion of the test. Second, students were scored as reporting no 

* 

elimination strategies (0), or reporting one or more strategies (1), 
reqardl«ss of the "correctness" of their answer to ea^ti test question. 
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Mk 

Results and Discussion , 
A po1nt-b1ser1al correlation was computed between entering grade level 
of student and presence or absence of reported elimination strategies. The 
resulting coefficient, .44, was statistically significant (£ < .02) and 
represented a moderate relation between grade level of student and reported 

9 

use of elimination strategies, accounting for approximately 20% of total 

variance.- Proportion of students reporting use of elimination strategies by 

■ . . « i 

grade level is given 1n Figure 1. 



Insert Figure 1 about here 



Thus, 1t appears that a developmental trend \u one aspect of Tw\can be 

> 

observed in children of elementary school age, and that this trend 1s 
similar to that seen in older students. These findings must be Interpreted 
with caution, however, due to' the limited sample size, as well as the fact 
that only one aspect of TW was measured. Although further research 1s 

■ % * 

needed, the results of this preliminary Investigation suggest that students 
begin to learn TW skills as early as the primary grades, and that these 
skills continue to improve with age. 
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1- The author would like to thank Karla Bennlon, Steve l/ifson, Dr. Jay 
Monson'and the staff of the Edith Bowen School for their assistance on this 
project. 
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Figure Caption 



Figure 1. Proportion, of students reporting elimination strategies by 

• « r 

grade level. , . • ' .* 
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Summary 

It has been seen that children's scores on reading achievement 
tests vary not only with knowledge of content but also with the 
differing formats of test items.; Teachers working with learning 
disabled children or children with attention problems may wish to 
choose standardized tests with fewer rather than more format 
changes. The present study evaluated the number of format arid 
direction changes, across tests and grade levels, of the major 
elementary standardized reading achievement tests. The number af 
format changes varies, from one change y every 3.2 minutes on the 
California Achievement Test Level 13 to one/tTlange every 40 
minutes, on the upper levels of the Metropolitan Achievement Test. 
Teachers may wish to take this Into account when considering 

« 

standardized reading achievement tests for their students. 
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Format Changes In Reading Achievement Tests . 
It has been seen that the format of achievement test Items 
has an effect on children's test scores (Benson & Crocker, 197$; 
CarcelH & White, 1981). In one study of reading achievement, 
children's responses to Items with the same content but 1n 



different format varied from 45% to 92% correct (White, Carce!11, 
& Taylor, 1981). Children In grades lower than the fiftlj grade 
have attained significantly lower test scores when the ma'jor 
format change of using a separate answer sheet- Is Introduced 
(Harcourt Brace Jovanovlch, 1973; Ramseyer & Cashen, 1971;.fc«shen 
& Ramseyer, 1969). Learning disabled children, children with 
attention problems, and children functioning below grade level 
may be even more adJersely affected by format changes. 

Given theextent to which different formats Inhibit correct 
responding, and the lesser ability of children at earlier 

t r. 

developmental stages to adjust to major format changes, teachers 
of such students may wish to consider a reading achievement test 
with less frequent rather /than more frequent 'format changes. 
Teachers will prefer to use tests on which a student's scores are 
affected only by knowledge of content, not the ability to adjust, 
quickly to format changes. Since format has been shown to be a 
variable Influencing test performance, this Investigation 

i 

intended to compare the number of format changes, across tests 
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and grade levels, of the major elementary standardized reading 
achievement tests. . 

Procedure 

Reading subtests ^f tde following standardized tests were 
analyzed for format changes: .the Stanford Achievement Test (SAfTy^ 

levels, Primary,"!, Primary 2, Primary 3, Intermedl ate 1 , 

• • • 

• ■ 

Intermediate 2; the California Achievement Tests (CAT) levels 

* • - 

10-19; the Metropolitan Achievement Tests (MAT) levels Primary 1, 

Primary 2, Primary 3, and Elementary; the Iowa Tests of Basic 

Skills (ITBS) levels 7-14; the Comprehens1ve*Tests of Basic 

4 

* 

Skills (CTBSTlevels A-G; and the SRA Achievement Series levels 
A-D. . » 

■ ■• t 

A format change was defined as a variation 1n the number of 
options per Item, .a change' from tolumn to row or row to column, a 
change 1n either stem or options from word to picture to passage 
/to question to cloze Item. Comparisons across tests and grade 
levels were made by dividing tfTe time, allowed by the number of 
formats in the test. For example, 20 m1nutes/4 formats means^ 
that In this case, there 1s a format chajRe every 5 minutes. * 
Interrater agreement was Calculated at 91Jn\ 

Results 

The "standardized test wuS the least number of format or 
direction changes. 1s the' Metropolitan Achievement Test, which 
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averages one format o'hange every 27 minutes.. The MAT uppe/^ 
levels have only orije change every 40 minutes. Jhe test with the 
greatest number of changes is the California Achievement Test, 
with a format change every 9.1 minutes. The CAT level '13 for 
second and third graders has a format change every 3.2 minutes. 
The results for ail tests ami all levels are presented 1n Table 

1/ • " 



Insert Table l„about here 

The mean of the format changes across grade levels varies 
from one change every 8.8 minutes at grades 2-3 to one change 
every 15.7 minutes at grades 5-7. These results are summarized 
graphically 1n Figure 1. 



Insert Figure 1 about here 



Discussion^ 

Children's test scores vary hot only with knowledge of 
content, but also with the differing formats of test Hems. 
Teachers of children with difficulties may wish to consider 
standardized tests with fewer rather than more format changes. 
The number of format changes on the major standardized reading 
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achievement tests varies from one change every 3,2 minutes on the 
CAT level 13 to one change every 40 minutes on the upper levels 
of the MAT. If a teacher suspects that students' have difficulty 
adjusting to new formats, she or he will prefer to use a test 
which allows' a reasonable amount outline before switching to a 
dftfersiit format. 
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Table 1 ^ 

Minutes Per Format and Direction Changes 



Test 


.Level 


, # Minutes/format change 


'* CAT 


13 

14-19 


f 19.3 

11.4 
6.1 » 
3.2 
9.0 


CTBS 


A 
* B 
C 
0 
E 
F 
G 


7.6 , 
7.5 
8.1 
8.0 
8.8 

6.7 ^ 

6.7 


ITBS 


7 
8 

9-14 


6.8 
6.2 
19.0 


MAT 

* 


PI 
P2 
El 
Int 


15.0 
40.0 
40.0 
40.0 


SAT 

• 


PI 
P2 

' 11 
12 


21.2 
15.0 
20.0 

y ^1.3 

21.3 


SRA 

> 


A 
B 
C 
D 


13.9 
16.4 
14.2 
24.0 
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Figure 1. 



figure Caption 
Number of minutes per format change. 
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Abstract 

Previous research has indicated that students In many Cases can 

answer reading comprehension test questions correctly without 

having read the accompanying passage. The present research 

compared, In two experiments, the »1 llty of learning disabled 

(LO) students and more typical age peers to answer such reading 

comprehension questions presented. Independently of reading , 

passages. In Experiment 1,'LD students scored appreciably lower 

under conditions resembling standardized administration 

procedures.' In Experiment 2, reading decoding ability was 

controlled for; however, the performance differential remained the 

same. 'Results suggested a relative deficiency on the part of LD 

students with respect to reasoning strategies and test-taking 

skills. In addition, the validity of some tests of "reading 

• , • <,i 

comprehension" was discussed. 
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Are Learning Disabled Students "Test-Wise?": 
' An Inquiry Into Reading Comprehension Test Items 
For many years, there has been some argument over what, 
reading comprehension tests "reatly" measure (e.g., Thorndlke, 
1973-1974). The most commonly observed standardized reading 
comprehension Item format consists of a passage and a number of 
associated multiple choice questions. Reading and understanding 
the passage Is assumed to be a necessary precondition; to 
correctly answering the questions. After examining the 

. Uteratdre, however, one Is forced to question the assumption of 
question dependence on the stimulus passage. Preston (1964) found 
that college students Were able to answer reading comprehension 
items with the passages blacked put at a rate significantly above 
chance. Tulnman (1973-1974) administered five major tests to 
9,451 elementary- level students under several conditions. 
Students In the np passage condition (relevant passage had been 
blacked out) on the average achieved only 30% fewer correct 
answers than subiec^ln the passage-In condition. Similar 

'results were obtained by Pyrczak (1972, 1974, 1975, 1976) and 
Bickley, Weaver, and Ford (1968). A follow-up study of passage 
Independence by Llfson, Scruggs, and Bennlon (1984) revealed that 
passage-independent Items are still quite common In elementary 
levH achievement tests. College undergraduates were able to 
answer 75X, or almost 12 of 16 questions on the Stanford 
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Achievement Test, Level P-3, without reading the associated 
passages. This Is considerably above chance. 

Scruggs, Bennlon, and Llfson (In press ^Interviewed 

j 

elementary age students regarding their, responses on a reading 
comprehension test. They found that students often chose the*r 
answers based upon thelr^wn prior IdHwledge, rather than content 
of the reading passage. When students reported using such prior 
Information, they answered correctly In over 60X of the cases. 
Reading comprehension Items which are Independent of the 

& $ 

associated passage can be answered on the basis of the following: 
U) general knowledge, (b) interrelatedness of the questions on a 
particular passage, and (c) faulty Item construction, I.e., keyed 
option 1s twice as long or more precisely stated (Pyrczak, 1975). 
In the first two cases, the presence of enough information in the 
question stem, to Identify the topic 'Is an important factor (e.g., 
"Which of the following statements isi NOT true of penguins?"). 
Such <i stem may render a question answerable in terms of 0 
Information a.lready available to the examinee, and provide clues 

to the answers of related questions about the same passage that 

• • " i • 

lack such Information in the stem ("This passage is about: a) 
birds of South America, *b) birds of the Antarctic ... etc.). 

♦ 

These clues which Individuals apply to a testing situation to 
maximize their scores, correspond to Millman, Bishop, an4 Ebel's 
(1965) criteria of test-taking skills, or "tsst wisenoss," ' v - 

I 
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White test constructors may be able to point to high validity 
coefficients for their relJlng comprehension tests and subtests^ 
an Important question arises concerning whether all students are 
equally able to answer questions with the above mentioned 
characteristics without reading the passage. Are some groups of 
students at a relative advantage/disadvantage in ability to answer 
these questions without reading the passage? To answer this 
question a group of students .classified as learning disabled (LD) 

and a group of regular classroom students were administered a 

t 

selection of multiple choice reading comprehension questions with 

the relevant passages removed. The conditions of this experiment 

were meant to resemble those of a normal testing situatlon-i.e., 

students were required to read the questions without assistance. 

This did not permit us to determine the extent to which any 

observed differences between the regular and LD students were due 

to reasoning or variations In general knowledge between the two 

groups or simply reflected a difference In reading ability. To 

address this issue, a second experiment was performed to 'see if 

similar differences could be found when word reading was 

controlled for. i 

Experiment 1 

✓ Method 

Subjects and Materials 

Subjects consisted of o7 regular classroom and resource room 

third grade students selected from several elementary schools in a 

» 
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western rural area. Of these subjects, 5* were regular classroom 
students and 15 were classified as LD by P.L. 94-142 and lo&l 
criteria, which Included a 40% discrepancy between actual and 
expected performance In two areas of academic functioning. The 
average grade equivalent of the total reading soore of the non-LD 
students on the Comprehensive Test of Basic Skills (CTBS) was 3.4 
(SD-.8), while the average CTBS total reading score for the LD 
students was 2.1 (SO-. 5). 

Fourteen multiple choice reading, comprehension questions 
without the accompanying passages were selected for* this task. 
Items were drawn from the Stanford Achievement Test, Level P-3, 
Form E (1982). Items had been chosen to represent questions 
thought by the author to be answerable In te f ?mrTr*^ (a) the 
general knowledge of the test taker, and (b) the degree to which, 
the interrelatedness of the Items served as a cue to the answers. 
These Items were taken from the Llfson et al. (1984) study, In 
which student^' ability to answer these questions had been 
documented. The Items were kept In clusters which belonged 
together In terms of ( association wlth'a particular passage. 
Procedure 

Treatment was administered In regular Instructional 
groupings. Materials were passed out and ajl students were told 
that they were about to take a reading test for which they would 
not be shown the accompanying reading passages, but that they 
should try their best to answer all questions. No time limit was 
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Imposed upon the task. • r-\ 
Results 

The regular classroom group answered correctly approximately 
55* of the questions, for mean score of 7.8 (SD-1.96). Tills score' 
was significantly, above a chance score of 3.5 (t (102) - 11.27, \ 

p<.001). In contrast, the LD students answered correctly only 35i 

I 

of the questions, for a mean score of 4.9, only slightly higher. \ 

" - i 

than chaice (t(28) - 1.77, ns). The obtained score of the non-Lp 

i 

group was significantly higher than the LD group (t(65) - 4.91, ' 
P<.001). ' -f 

Discussion s Vs 
Jhe present findings suggest that regular classroom students 
are able to recognize and make use of cues in testing situation^ 

In orde.r to increase their scores, even when reading passages are 

! 

deleted, and "reading comprehension" supposedly cannot be j 
measured. Apparently, LD students* are not able toteneflt equally 
from these cues. Since neither group should^have scored above 1 
chance on a reading comprehension test with the reading passages 
deleted, It Is possible that a certain amount of bias exists 
against children. with learning disabilities on some standardized 
tests of reading comprehension. Students In regular classes when 
unable to read' or otherwise obtain meaning from reading passages 
are still able to answer rorrectly comprehension questions. 
Students with learning disabilities, however, do not seem to have 
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\ these skills, and are thereby punished twice for a reading , 
\\ handicap: once for being less able to read and comprehend the 

passage, and a second time for being unable to 'second guess" test 

' ' ' 

questions, as their nonhandl capped peers are apparently able to 

do. • : ■ * ' ' . 

One possible explanation for this discrepancy between LO and 
regular classroom students Is that* LD students are simply less 
able to read (decode) the questions, and for that reason are less 
able to outguess the test. ! That Is, LD students are less 
deficient In "test taking skills"- than they are In reading 
ability. In order to address this question,, a second experiment 
was designed, In which ability to read would be controlled for. 
Although the conditions in this experiment could not parallel 
those of standardized test procedures, they did allow for an 
assessment 6*f the extent to which differential scores are 
attributable to lower reading skills, or to lower levels of "test- 
wlseness." 

Experiment 2 

Method 
» 

Subjects and Materials 

The 42 subjects who participated in this Investigation were 
different students drawn from the same population as those of * 
Experiment. 1, and consisted of ,27 regular classroom third grade 
students and 15 third grade children classified as LD by P.L. 94- 
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142 and local district criteria. Mean grade equivalent for the 
non-LD group -(CTBS total reading) was 3.6 (SD-.9), and 1.9 (SD«.4) 
for the non-LD group. Materials were 14 Items drawn from the 
Stanford Achievement Test, level P3, form F, and were dhosen on 
the' same basis as those used In Experiment 1. Pages of the test 
were again left intact with questions left in the original order 
and the passages themselves blacked out during the copying 

» 

process; * , , 

» 

Procedure . . , 

Students were Informed by their teacher that they were about 
to take a reading test without reading the corresponding passages. 
They were told to listen while the teacher 4 read each item, and 

* * * 

then answer °the Items. 

Results and Discussion 
The studej : ts In regular classrooms answered correctly 65* of 
the fourteen items, for a mean, score of 9.14 (SD-1.8). The LD 
students, on the other hand, answered correctly only 45% of the 
items, for a mean score of 6.33 (SD«1.8). Although both obtained 
scoresare well above chance, (t(52) « 12.02, and t(28) « 4.325, 
ps;<.001, for regular classroom and LD students, respectively), the 
regular classroom group maintained Its advantage over the LD* 
students, t(40)=4.87, p<.001. The results suggest that learning 
disabled students are less likely to apply test-taklnfg strategies 
to reading comprehension questions to a degree of efficiency 
similar to their non-LD counterparts. 
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General Discussion 
In Experiment 1, regular third grade classroom students were 

, seen consistently to outscore their L0 counterparts on a test of 
reading comprehension questions with corresponding passages', 
deleted, and administered junder conditions resembling standardized 
testing procedures. In Experiment 2, regular class third graders 
again outscored 10 students, under conditions for which reading 
ability was. controlled? The ability of third grade children* in 
these cases to "score 55% and 65% correctly on questions which 

. refer to non-existent passages seems remarkable, and brings Into 
question the issue of what some tes,ts of "reading comprehension" 
are really measuring. Such passage Independent Items have been 
thought to" .assess test-taking skills and. in fact have been used 
as measures of "test-wiseness" (e.g., Derby, 1978). Whatever such 
tests measure, it Is clear that: (a) It Is not "reading 
comprehension," and (b) children classified as LD ire at * 
an apparent disadvantage. 

An argument can be njade that these comparisons are of trivial* 
Importance , 'Since In standardized test administratis passages 
are not deleted;, that all children in fact have equal access to 
passages which contain answers to reading comprehension questions. 
Although this argument has- a certain face validity, some problems 
remain. First, since non-LD students can score so high on such * 
items without reading the passages, the extent to which scores are 
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a direct measure of "reading comprehension" seems uncertain. 
Second, since nearly all such tests are timed, students with 
Incomplete understanding of relevant passages but possessing an 
ability to "outguess" test questions under time constraints, 
clearly are at an advantage with respect to students not 
possessing such an ability. In this case, differences In scores 
on reading comprehension tests may In fact reflect In part a bluS 
toward students with superior "test-wlseness." As has been seen 
In the present experiments, LD students may well find! themselves, 
on the negative side of any such bias. 

The extent to which LD and their non-LD counterparts differ 
on the present measures appears to have surprisingly little to do 
with reading ability. Although both groups gained when reading 
(decoding) ability was controlled for, each group was seen to 
exhibit the same degree of gain, amounting to 10 percentage points 
for each group,, Reported t values in Experiments 1 and 2 remained 
virtually idem'ral. It seems clear, then, that much of the 
obsei ' performance difference in Experiment 1 was due to skills 
other than re^fng ability, or "reading comprehension." Possibly, 
relative deficits In vocabulary knowledge account for some of 
these differences. What also may be a factor Is a differential 
abiMty to respond to specific cues In the test-taking situation. 

Two steps may be taken to help alleviate this potential \ 
source of bias. First, achievement tests should be revised so \ 
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that reading comprehension tests directly assess comprehension of 
the provided passage. In fact, an Informal review by the present 
authors of the major achievement tests Indicates that many 
achievement test questions appear to be much less "passage 
Independent" since the work of Tulnman (1973-1974) and others of a 
decade ago. Second, It seems possible that at least some of these 
"test-taking skills" can be trained, and that this training may do 
much to correct this apparent disadvantage. The authors are at 
present Investigating the effectiveness of such training (Taylor & 
Scruggs, 1983). Although such Improved scores on tests may not 
necessarily reflect Increased achievement, these scores could 
reflect more accurately achievement gains students h,ave made, as 
evaluated by standardized achievement tests. 
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PASSAGE INDEPENDENCE IN READING ACHIEVEMENT 
TESTS: A FOLLOW-UP 1 

STEVE UFSON, THOMAS E &CRUGGS, AND KARLA BENNION 
l)uh Stju* Univtrsity 

Smmm0i.—3H colleje undergraduates were administered reading-compre- 
hension items from ft major standardized achievement test with corresponding 
pansges deleted. Analysis indicated that, after 20 yean of similar research 
findings, highly pass*ge«independent items still xcut on major tats. 

For almost 20 years, it has been documented that reading-comprehension ' 
test item* can, be answered correctly at above-chance rates without actually 
reading the relevant passage (Preston, 1964). Pyrczak (1976) mentions 
several types of items which seem particularly independent of the passage. 
These types include Xa) items that can be answered from the examinee's own 
knowledge and (b) items about a particular passage that are related to each 
other in such a way that some items provide clues for other items. Reading* 1 
comprehension tests which include such items invite critical attention on th* ^ ^ 
grounds that (a) examinees may hive an advantage over those not using these 
strategies (Pyrczak, 1972) and (b) 9 if a subject uses these principles and 
skips passages, he invalidates the purpose of the test (Tuinmun, 1973-1974). 
Since an extensive review of the literature has shown no justification for the . 
use of passage-independent items, the question arises as to whether these items jr 
still occur in commonly used standardized achievement tests. The present in- 
vestigation was intended to determine whether such items are still in use. 

Method 

St/ljects and Materials 

Thirty-eight undergraduate elementary education students at a western 
university completed 16 multiple-choice reading-comprehension questions 
without the accompanying passages. The items selected were thought to rep- 
resent questions that could be answered without having read the accompanying 
passage. These items were chosen to correspond to Millman, Bishop and Ebel's 
(1965) categoxies of test-wiseness strategies involving the general knowledge 
of the test taker and use of subject matter of neighboring items. The specific 
effects of these cues, however, were not addressed in this study. The 16 items 
were taken from the Sunford Achievement Test Form E, Level P-3, from a 
pool of 60 items. Th* litems were kept in clusters illustrating which belonged 
together in terms of associntiofc with a particular passage. 

•The authorrtha^ for his kind and'generous assistance with this 

investigation, Requests fo/ reprints should be addressed to Steve Lifson, Exceptional 
Child Center, UMC 68, Utah State University, Logan, Utah, 
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Procedure 

The materials were distributed to two sections of a class in techiog read- 
ing* The students were told: 'Today I'm going to give you some reading- 
comprehension test items witboui the passages. It is not ocpected that you 
will answer all of the questions correctly; just do your best. Guess if you do 
not know the answer" No time limit was imposed upon the task. 

Results and Discussion 

Analysis indicated that the mean score was 73% correct, with an average 
mean score of 11.9 of the 16 items. ' A one-sample / test (Hayes, 1973) con- 
firmed that the obtained scores were significantly different from chance re- 
sponding ($ es 18.9, p < .001),' 

Although the items were J not randomly selected for this measure, they 
nevertheless represented 25% of the items ipcluded in the reading-compre- 
hension section of the test. Clearly, at least some test developers have done 
little to alter passage-independent items in light of the research findings of 
almost two decades. While the effects of the readess' previous knowledge 
cannot Ibe eliminated, the effects could be minimized by die use, of fictional 
material for tip passages with accompanying questions about the activities of 
an imaginary person. In spite of the reported validity of these items (SRA,' 
1979), the burden of construct validity rests with the authors of the tests. If 
some students are able to answer "reading-comprehension 11 test items correctly 
without reading the passage, one can question yhat is being measured. 
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Abstract / ,( 

The present Investigation was Intended to provide Information on 
the type of strategies employed by learning disabled (LD) students 
on standardized, group-administered achievement test Items. Of 
particular Interest was level of strategy effectiveness and 
possible differences In strategy use between LD and non-disabled 
students. Students /attending resource rooms and regular third ' 
grade classes were administered Items from reading achievement 
tests and Interviewed Individually concerning the strategies each 
had employed In answering the questions and level of confidence In 
each answer. Results indicated that (a) LD students were less 
likely to report use of appropriate strategies on Inferential 
questions, (b) LD students were less likely to attend carefully to 

r 

specific format demands, and (c) levels of confidence reported by 
LD students were inappropriately high. 
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^ Spontaneously Employed Test-Taking' Skills 
of Learning Disabled Students on 

« ■ * 

Reading Achievement Tests 

Since the seminal work of Millman, Bishop, and Ebel in 1965,, 
concern has been given to. the issue of test-taking skills, or 
"test-wiseness," as a source of measurement error in group- 
administered achievement tests (Sarnacki, 1979). Defined as. "a 

;ubject's capacity to utilize the characteristics and formats of 
€ <he test and/or the. test-taking situation to receive a high score" 
(Millman et al., 1965, p. 707), test-wiseness is said to include 
such diverse components as guessing, time-using, and deductive 
reasoning strategies. Given that the effective use of such 

i 

strategies may have little* to do with knowledge of a particular 
academic content area, individuals or group* of individuals 
lacking in these skills may be at a disadvantage. A recently 
completed meta-analysis, for example, has suggested that under 
certain circumstances, low-SES students are more likely to benefit 
from .achievement test "coaching" than are higher SES students, 
which finding implies low-SES students are relatively deficient in 
the area of test-taking skills (Scruggs, Bennion, & White,- 1984). 

The present investigation was concerned with the spontaneous 
use of such strategies by learning disabled (^D) children. Part 
of a larger investigation involving test-taking skills of 
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exceptional students (Taylor & Scruggs, 1983), the, present study 

« • • l 

had as a goal the Identification of possible defjcits In test- 
taking skills on the part of LD children. Such deficits, If • 
uncovered, could be helpful In developing technique^ for 
remediation. ' i 

Although much research has been conducted on Jon -handicapped 
poiplatlons In the area of test-taking skills (seef Bangert-Drowns 
Kulik, & Kullk, 1983; Sarnacki, 1979; and Scruggs j Bennion, & 
White, 1984, for reVjews), little is known about tes,t-taking 
skills exhibited /bf^LD children. Scruggs and Lifson (1984) 

recently investigated the differential ability of LD students to 

I 

answer "passage Independent" reading comprehension test items 

■ * t 

(i.e., reading comprehension test Hems for whl^h relevant 

< 

passages had been omitted). Items were 'taken fh>m standardised 
achievement tests known from previous research /to be answerable 
without having read the associated passage (Lifson & Scruggs, In 
press), and thought to be a good measure of "test-wiseness." In 
two experiments, non-handicapped children scored 55% and 65% 
correct- on such items, while LD children from the same grade 
scored much lower, even when word reeding cbility was controlled. 

Scruggs and Lifson (1984) argued that such findings also raised 

i 

the question of what reading comprehension tests "really" measure 
since no reading comprehension test items should be answerable 
without having read the associated passage. Scruggs and Lifson 
concluded that LD children may be at a relative disadvantage with 
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respect to such test-taking skills as guessing, elimination, 'and 
deductive reasoning strategies applied to response Items. 

Scruggs, Llfson, and Bennlon (1n press) recently employed 
individual Interview techniques to more precisely determine the 
nature of the strategies spontaneously produced by elementary 
school children on reading achievement t^sts. Students 
representing a wide rang* of age and ability levels were given 
reading achievement Jest items appropriate to each student's 
reading level. Results indicated that students employed a wide 
range of strategies on reading achievement tests, far beyond 
simply "knowing" or "not .knowing" the answer, and that the use of 
these strategies' was strongly predictive of performance. These 
findings provided valuable general information regarding the 
manner in which children respond to reading achievement test f 
itams. However, the diversity of the popuValion in age and 
achievement level was thought to have obscured observation of 
specific differences in test-taking skills between age or ability 
levels. The present investigation, therefore, was intended to 
determine whether differences in strategy use existed on reading 
achievement tests between LD and non-disabled students. In this 
investigation, grade level was held constant and the number of 
subtests was reduced to two: a "reading comprehension" subtest, 
in which direct referring, elimination, and deductive reasoning 



80 



4 Test-Taking Skills ' 

. ■ . - ' 6 

strategies were thought to be important;/ and a "letter sounds" 

r 

subtest, in. which close^ attention to format demands was thought to 
be of ^Importance. In addition", since liyel of repoVted confidence 
was found to be a strong* predictor of performance (Scruggs, 
Bennion, & L^fson, in press), *and a prerequisite to strategy J 
monitoring, confidence reports were examined for possible 
differences between ability groups. 

'Method' 

Subjects • • • . • 

/ \ . ' 

- Subjects were 32 third grade students attending pfiolic 

* • - 

schools in a western university community. Twelve were classified 

as learning disabled (LD) according to local school district 

criteria^ which included a 40% discrepancy between ability and 

performance 1n two academic areas r arid Public Law 94-142. Twenty 

were regular classroom students, none of whom were referred for 

special .services and who were- thought by their teachers to be 

functioning within* a normal range- of performance. Mean grade 

equivalent for reading comprehension on the Comprehensive Test of 

Basic Skills (CTBS) was 2.29~ (SD*.29) , equivalent to a percentile. 

score of approximately 21.' Mean grade equivalent on the CTBS for 

the non-LD students was 3.91 (SD s .89), equivalent to a percentile 

score of 61. The 16 boys and 16 girls were all 8-9 years old and 

Caucasian. Sex was evenly represented in LD and non-LD groups. 

Materials 

Two reading tests were constructed from items taken from the 
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Stanford Achievement Test. Items were drawn from the Primary 2 

battery for the lostrument used with the LD group,* and the 

Intermediate 1 level served as the source for the regular 

classroom group. Each, test contained <three reading- passages with 

14 dependent questions (10 content, A inference; on each form. 

Comprehension questions were left in the.ir original order in . 

relation to the passage. The, questions were renumbered to avoid 

gaps where passages idid not follow each other sequentially in the 

♦ 

original test.^ In addition, three items from the letter-sound 
•test (level P3) were selectee". These consisted of a stimulus word 
with a .letter or letters underlined representing a sound that the 
student was to identify in the three options given below the 
stimulus word. These items p were selected to include a distrattor 
that closely matched the initial consonants of the* stimulus word. 
For example,. in the item: 

blind 

0 blink' 

0 nipble / 

0 leaned 

"leaned" is the correct answer', since it contains the same sound 
as the underlined nd in the stem, fcnd "blink" is the inappropriate 
distractor, since it contains the same initial Consonant blend. 
Procedure •* \ 

Subjects, were seen individually by one of two examiners. 
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They were asked to read the passages and questions aloud and mark 
answers they thought were correct. They were then told that they 
would be asked if they were sure/not sure that the answer they had 
given was correct, and the manner in which they had chosen the 
particular answer. The subject's response to the questions, "How 
did you choose that answer?" and "Are you sure or not sure of your 
answer?" were recorded verbatim on the protocol. Words the 
experimenters had previously deemed essential to answering the 
questions (key words) were marked in the examiner's copy of the 
instrument, and errors in these words were noted as the child read 
aloud. 

Scoring / 

Test items were scored for correctness , confidence in answer 
fsure/not sure), and type of strategy reported. Two students from 
the non-LD group, who had misread more than 25% of the keywords, 



were excluded from further analysis. The/ responses given by the 



3 = External source of knowledge (e.g., "I know all fish have 

scales") 

4 = Refers to passage (e.g., "I read it") 

5 = Quotes directly (e.g., "it says here that . . .") 

6 = Eliminates options known to be incorrect 



subjects were divided into seven logica i (categories: 



1 = Don't know 



2 = 



Guessed 
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7 « Other reasoning (e 0 g., "It said comforted in the story. 
That sort of means relieved.") 
Each response was then evaluated in terms of those seven 
categories. Percent of agreement for scoring was assessed at 100% 
after each examiner scored 25% of the other examiner's protocols. 

Results 

A t test applied to percent of keywords read incorrectly 
indicated that the groups did not differ significantly with 
respect to reading difficulty, _t( 29 ) a .37, £ > .20. Overall, LD 
students misread 6.6% of (30) total keywords and non-LD students 
misread 6.75% of (29) keywords. 

Proportion correct by collapsed strategy group 
(inappropriate s strategies 1-3; referring a strategies 4-5; 
reasoning * strategies 6-7) was computed for item type and student 
group and is given in Figures 1 and 2. 



Insert Figures 1 and 2 about here 



Reported strategy data were scored for appropriateness of 
reported strategy. Strategies were considered appropriate if 
students reported referring to the passage on a recall question 
(strateqv 4 or 5), or if they reported a reasoning* strategy in 
response to an inferential question (strategy 6 or 7). Proportion 
of appropriate responses were then entered into a 2 group (LD vs. 
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n-LD) by 2 item type (direct recall or Inferential) analysis of 
variance (ANOVA) with repeated measures on the Item type variable. 
Because of the unequal group frequencies, a least-squares method 
of analysis (Winer, 1971) was employed. Significant differences 
were found for item type, F(l,29) « 9.19, p. < .01, and for 
Interaction, £(1,27) « 7.58, £< .05. Figure 3 depicts 
graphically the interaction effect. Although both LD and non-LD 



Insert Figure 3 about lere 



students reported a high proportion of "referring to text" 
strategies on recall questions (89% vs. 77%, respectively), large 
differences emerged In proportion of reasoning strategies applied 
to inferential questions (39% vs. 70%, respectively). 
Nonsignificant differences were observed for overall group means, 
F(l,29) * 1.54, ns. . 

Analysis of confidence reports indicated that bfcih groups 
were similar with respect to reported level of confidence on 
"referring to passage" strategies with LD students reporting 
confidence in 85% of the casss and non-LD students reporting 
confidence in 92% of the cases. These reports were similar to 
actual performance, with correct scores of 81% and 86% on these 
items for LD and non-LD groups, respectively. On reasoning 
strategies, however, a much different picture emerged. Average 
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students were correct on 83% of Inferential Items, but reported 
confidence on an average of 71% of the items. The LD students, on 
the other hand, reported being confident on an average of 95% of 
the cases, but were 1n fact correct In only 63% of these case^. 

Items on the letter-sound subtest were scored for responses 
which suggested attention to an inapp' jpriate distractor. This 

0 

inappropriate distractor took the form of an initial consonant 
blend present in the stem, but not underlined. Number of ' 5 
ir.approf riate distractors chosen was compared by group, and 
differences found to be significant, £(28) ■ 2.47, £ < .05. The 
L0 children chose the Inappropriate distractor in 52% of the 
cases, while the non-ID children chose the inappropriate 
distractor in only 24% of the cases. 

Discussion 

It has been seen that the present sample of LD third graders, 
with reading ability controlled for, differed from their mc(re 
average counterparts with respect to (a) proportion of appropriate 
reasoning strategies reported for inferential comprehension 
questions, (b) performance' and confidence lavel for items in which 
reasoning strategies had been reported, and (c) chjice of an 
inappropriate distractor on a letter-sounds test. n " ".e other 
hand, LD students did not differ from their more average 
counterparts with respect to appropriate strategy use on recall 
items. Generally, this sample of LD children war, seen to report 
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fewer reasoning strategies t when appropriate, on reading 
comprehension test Items than did their more average counterparts, 
and to be less successful on those items for which they did report 
reasoning strategies. These findings support those reported by 
Scruggs and Lifson (1984). In that study^JLO chlldren^Jwere seen 
to exhibit relatively Inferior performance on a test of selected 
reading comprehension test items for which the relevant* passages 
had been removed, and for which reasoning strategies were thought 
to be necessary in order to answer the items correctly. The 
present finding of inappropriately high levels of confidence 
exhibited by the LD students on Items for which reasoning^ 
strategies had been applied is supportive of a theory of a 
developmental deficit in "meta-cognitive abilities" (cf, Torgesen, 
1977), in that inappropriately high levels of confidence in task 
performance are often seen 1n younger children. This relative 
deficit on the part of LD children is thought to be a critical 
one, for ability to evaluate accurately a chosen response is a 
necessary prerequisite for effective test-taking performance. 

That LD students more often attended to an inappropriate 
distractor may be a function of an attentional deficit (Krupski, 
1980) pn test format as- much as a deficit in phonetic skills.- 
These "test-taking skills" may or may not be subject to simple 
remediation (Taylor & Scruggs, 1983), but they may reflect a 
source of measurement error (Millman, Bishop 1 , & Ebel, 1965). 
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Reading comprehension, clearly, Is a construct which seems to 
resist precise analysis and for which many theoretical 
orientations exist (Splro, Bruce, & Brewer, 1980). If one doesj 
look at recall and Inference as two component parts of reading 
comprehension, however, It appears from the present Investigation 

-j 

that relative strategy and performance deficits exist ^pn the part 
of LO children on inference questions, but not on recall 
questions, with reading ability controlled for. To this extent, 
one could argue that the specific deficits exhibited here reflect 
problems in reading comprehension itself rather than "test-taking 
skills," and it does seem likely that strategy training in such 
areas could reflect improved reading comprehension skills as well 
as improved test-taking skills, particularly in that selecting and 
Implementing appropriate strategies has been used in research to 
improve general cognitive functioning (cf. Torgesen & Kail, 1980). 
In the word study skills Subtest, however, the LD students 
apparently became confused by specific format demands which likely 
had little to do with the content being testi d. Training for this 
type of strategy deficit, then, would not be. expected to bring 
about, a concomitant increase in phonetic analysis ski lis. 1 

Replication is necessary to further support and refine these 
findings. The present results suggest that LD children may 
benefit from specific training in (a) attending to specific format 
demands, (b) identifying inference questions, and (c) selecting 
and applying appropriate strategies relevant to those questions. 
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Figure Captions 

F1ajjre_l. Proportion correct by strategy used 6n recall item. 

i 

FlojjreJ!. Proportion correct by strategy used on Inferential 
Items. 

Figure 3 . Illustration of two-way interaction of group by 
reported use of appropriate strategy. 
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Spontaneously Employed Test-Taking Strategies 
of High and Low Comprehending Elementary • 
School Children 

Thomas E. Scruggs, Karl a Bennlon, and Steve l/ifson 
Exceptional Child Center 
Utah State University 

0 

If important decisions are to be based on the results of vtandard1«d reading tests, student scores 
snould provide the best possible estimate of reading performance. Unfortunately, the results of past 
research have indicated that student standardized test performance can be influenced by factors other 
than knowledge of test .content. One of these factors, test-wlseness. Includes time using strategies, 
error avoidance strategies, guessing strategies, and, deductive reasoning strategies. , 

A question emerges concerning '"the extent to which elementary school studlnts employ "test-taking" 
strategies when' .faced*. with difficult or ambiguous reading test Items. Do students spontaneously' use 
such strategies (lhat .Is, without being trained)? If so.' which strategies (if any)"are effective in 
obtaining correct answers?;-* • • ^ '. . ) « * 

• ' *' 

?o address those Quest *<w»* % 'W&fe present study, the reading test* performance of elementary .school 
children was examined'. ' Ui - Experiment 1, two areas were investigated;* (a) the strategies 
spontaneously employed- by students to answer reading test Items, ar.d. (b) tne relative effectiveness 
of these strategies in increasing reading, test scores. Experiment 2 examined the possible 'difference 
in use or utility of these strategies between average and learning disabled (LO) 'third graders. 

. • f EXPERIMENT 1 

Method 

CJ 

X samole reading test based upon Items from the Stanford Achievement Test (SAT) was administered to' 
31 elementary age Caucasian students (15 girls. 16 boys) attending summer classes In a western rural 
area. Students were selected so that a range of abilities as wett as grade- level* (1-6) were 
represented, ' 

All students were seen individually by one of four examiners. Students were giveh selections from 
the SAT taken fro*> the level one year higher than their assessed grade level on the Woodcock Reading 
Achievement Test, Passage Comprehension subtest, in this manner, a similar difficulty level was 
provided for each student. ^ Most students., wereable to answer correctly approximately two -.thirds of 
tne test quest i<jns. , 

Students were then told to read aloud each test question (as we 1 1 as the reading passages tn the 
reading comprehension subtest), and to t'ead aloud whichever of the distractors they chose to read. 
, They were neither encouraged nor discouraged* from reading each <J1stractor. As soon as students had 
answered a test question, they were asked* to rate their level of confidence In their response. , After 
students had finished each subtest, they were asked to re-read the questions and tell the examiner 
why they had thosen the answer they did. The examiner recorded reading errors, confidence levels, 
attention to distractors, reference to reading passage, and reported strategies. 

Results 

It was found that all strategy responses could be classified within a 10-level hierarchy which 
strongly predicted probability of dorrect responding. Proportion of correct responses was computed 
across subjects for each type of strategy used and are shown in Figure 1. 

These strategies were collapsed into five logical categories (skipping, procedural error, guessing 
strategy, deliberate strategy, and "knowing") and computed point biserial correlations for each 
subject. The median correlation between item score and reported strategy was .54 (j> < .01). No 
differential effects were seen by aqe or ability level, possibly due to the diverse nature of the 

sample. 
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Students had a reasonably good Idea of whether they had answered a test question cVrectly or not* 1 
When students reported being "very sure" their answer w*s correct, they were* In fact correct 8U of 
the time. When they reported being "somewhat sufe, * they were correct only 13J of the.time, and yhen 
they reported being "not suta", they obtained correct answers In ogly 7% of the cases. These figures 
.are somewhat misleading, however. If looked at another 'way, the results seem different: when 
students answered correctly, they also reported being "very sure" tHe answer was correct In 56X of 
the case*. * * 

1 # • 9> 

A great deal pf carelessness was observed In attention to all dlstractors. When students answered 
Incorrectly * In 40% of the 302 cases thqr had oot read all dlstractors. When students answered 
questions correctly, they had attended to all dlstractors ^1 \VL of the 577 cases. * 

The results of Experiment 1 provided valuab ^general Iqformatlop about the manner In which children • 
Respond to reading achievement test Items. However, the diversity of the population, In age and 
ability level, was thought to* have obscured direct Investigation of specific difference* with respect 
to specific ability levels. Experiment 2, 4 therefore, was conducted In order to determine whether % 
differences In strategy use existed between a sample of learning disabled children and a sample of 
children not so classified. In order to clarify interpretation, grade level was held constant, and 
the number of subtests was reduced to two. 

EXPERIMENT 2 

t * - ' ' 

4 Method * 

• • \ 
Subjects were 32 third grade students, attending public schools In a rural area of a western state. 
Twelve were classified as learning disabled (ID) according to Ideal school district criteria and P.L. 
94-142, and 20 were regular classroom students, none of whom were referred* for special services and 
who were thought by their teachers as functioning within a normal range of performance. Sex was 
evenly represented in ID and non-ID groups. 

•' ' 

Two reading tests were constructed from Items taken from the Stanford Achievement Test. Items were 
drawn frort the Primary 2 battery for the Instrument used with the L0 group, and the Intermediate 1 
level served as .the source for the regular classroom group. The tests each contained three. passages 
with 15 dependent questions. A Items were adjusted to ensure that 14 questions (10 content, 4 
Inference) remained on each form. Comprehension questions were left In their original order In 
relation to the passage. The questions/were Renumbered to avoid gaps where passages did not .follow 
each other sequentially In the orlgfna/ test. In addition, three Items from the letter-sound test . 
(level P3) were selected. These consisted of a stimulus word with a letter or letters underlined 
representing 'a sound that the student I had* to Identify In the three options given be'ow the stimulus 
word. These items were selected ,to Include a dlstractor that closely matched the Initial consonants ' 
of the stimulus word. For example, In the Item: <* 

blind - , / 

15 blink ' . * 

0 nibble j ' . 

0 leaned 

"leaned" 1s the correct answer, since 1t contains the same t sourt(1 a? the underlined nd In the stem^ 

and "blink" Is the inappropriate dlstractor, since It contains the. same Initial consonant blend. 

Subject* were seen Individually by one of two examiners. They were asked to read the passages *id 
questions aloud and mark answers they thought were correct. They, were then told that they would be, 
asked if they were sure/not sure that the answer thty had given was correct, and the manner In which 
they had chosen the particular answer. The subject s response to the questions,, "How did you choose 
that answer?" and "Are you sure or not sure of your answer?"' were recorded verbatim on the protocol. 
Words the experimenters had deemed essential t, answering the questions (key words) were marked tn 
the examiner's copy of the instrument, and efrors In these words were noted as the child read' aloud. 

Results 4 s 

Test itens^were scored for correctness, confidence In answer (sure/not sure)i <and type of strategy 
reported. Two students from the non-ID group, who had misread more than 25X of the keywords*, were 
excluded from further analysis. The responses given by the subjects were divided into seven logical 

categories: y ^ 

1 « Don't fnow, 2 » Guessed; 3 ■ External source of knowledge {e.g. "I know all fish 
have scales"); 4 » Refers to passage (e.g., M I read It"); 5 • Quotes directly (e.g., 
"It says here that . . ."); 6 » Eliminates options known to be Incorrect; 7 « Other 
reasoning (e.g., "It said comforted In the story.. That sort of weans relieved.) 
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each response was then evaluated In terms of those seven categories. Percent of agreement for 
scoring was assessed at 100% after each examiner .scored 25% of the other examiner's protocols. 

Proportion correct by, collapsed strategy group (In appropriate - strategies 1-3; referring ■ 
■ strategies 4-5; .reasoning « strategies 6-7) was computed for Item type and student gftup and Is given 
in Figures 2 and 3. As can be.seen, a monotonlcally Increasing trend Is seen for both groups. 

« • ** < 

«-A e i St *PP ,,ed t0 Percent of keywords read Incorrectly Indicated that the groups did not differ 
significantly with respect to 'reading ,d1ff1culty,j:<29) - .37, E > .20. Overall, LO students misread 
6.61 of total keywords and non-LO students mlsreav 6.75% of keywords*. " 

Reported strategy data were Tcored for appropriateness of reported strategy. Strategies were 
considered appropriate If students^reported referring; to the passage on a recall question (strategy 4 
or 5J, or If they reported a reading strategy In response tp an Inferential question (strategy 6 or 
' V' f ™* ot y on of appropriate fesjronses were then entered Into a 2 group (ID vs. non-LO) by 2 Item 
type (direct recall or Inferential) analysis of variance (ANOVA) with repeated measures on the Item 
type variable. Because of the unequal .group frequencies, a. least-squares method of analysis (Winer. 
1971) -was employed. Significant differences were. found for Item type, F(l,29) ■ 9.19, p < .01. and 

«ftL 1n ? r ^°?i -4 ,27) \ 7,5B ' £'< - 05 - F W« « deplete graphically the Interaction effecit • 
Although both LO and non-LO students reported a high proportion .of "referring to text" strategies* 
(89% vs. 77%, respectively), large differences emerged In proportion of reasoning strategies applied 
to Inferential questions (39% vs. 70%, respectively). Nonslgnlf leapt differences were observed for 
over-all group means, F( I, ;29) ■ 1. 54^ ns. . 

■ ' ■ A 

Analysts-of conf Idende^reports Indicate that both groups -were similar with respect to? reported level 
of Confidence on "referring to passage" strategies with LO* students reporting confidence In 85X of 
the cases and. non-LO students reporting confidence In 92%' of. the cases. These reports were similar 
to 'actual pefformance, with correct -scores of 81% and .86% on these Items for LO and non-to groups, 
respectively. On reasoning strategies, however, much different picture emerged. Average students 
.?r e .n 0rre ^ t on 83 * o*" ,nf <* r * nt< *l ttems, but reported confidence on ah average of 71% of. the Items. 
- The LO students, on the other hand* reported being confident on an av.er.age of 95%' of the cases, but. 
were in fact correct In only 63% of these cases. " ■ J . 

■» . . • ' I * 

Items on the letter-sound subtest were scored ; for responses which suggested v attention to an 
jnappropriate dlstractor. This inappropriate dlstractor took the form of an Initial consonant blend" 
present In the .stem, but not underlined. Number of Inappropriate dlstractors chosen was compared by 
group, and differences found to be significant, t(28) « 2.47, p < .05. -The ID children chose the 
inapproociate distractor in 52% of the cases, while- the non-D) children' chose the inappropriate 
dlstractor in only 24% of the cases. ' ... 

♦ , % % . ■ 

• Discission * » * 

iM" HV se *\ that the oresent s«"P^ of LO third graders, with reading *ab1 11 ty control led for. 
lll.T * ,l rm th f ^ ™ re a ? r ' 9e counterparts with respect to ta) proportion of appropriate reasoning • 
str f . 2S ^eported for inferential comprehension questions, (b) performance and confidence level for 
it<w ;.t (ch • -son ^strategies had been reported, and (c) choice of an inappropriate dlstractor 
on a letter-si.;,, ; . test. On the other hand, LO Students did not differ from their more average 

* rlnltT « W ' C ' 7 ipeCt ? «PP r opriate#trategy use on recall Items. Generally, this sample of ID 
till fiZ^^f? ?f* t 0rt fewer reason1n S strategies, when appropriate, on reading comprehension 

' Itir* IS M their more average • counterparts^ ana* to be less successful on those items for 

'• H Urn :, eason , ,n 9 . Tht >*">PHate1y "Jgh levels of confidence exhibited 

ay -he LO students on Items for which reasoning, strategies had' been applied Is supportive of a theory 
of. a developmental deficit in "meta-cognltive abilities", (cf. Jorgesen ,1977). In that 
inappropriately htgh levels of confidence In task performance are of t^en ?1n younger children. , 

fading comprehension, clearly, is a construct which seems to resist pVeclse analysis and for which 

jri.5S5! r . , 'l¥ a1 ° h Nen < tat,0 u ns ex, *Y If one does ,00k at reca11 and ,nf *™« " So component Tarts 
IL lUlJJ, . 0r l VI* however - u appears from the present Investigation that relative strategy 
jnd performance deficits exist-on the part of LO children on 'inference questions, but not on recall 
» 3uestipns, with reading ability controlled (pr. * 

t 

• ^JVr?* 0 !!,. 1 ] necessa 7 l ° fur , th « r support and refine these findings. The present results suggest 
** .JilfZ cnildr-n may benefit from specific training It. (a) -identifying Inference questions, 99 ?" 

• nr:L t ;i 9 srr^con t Je a n? g : ,eS r8leV ^ t t0 th ° Se qUe$t1 ° n$ » ^(cj'successfullyVplym^such 
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Abstract " <. 

Results of 24 studies which \nvest1 gated the effects of 'training 
elementary school children* 1n test-taking skills "On standardized 
achievement tests were analyzed using 'meta-analysis techniques. 
In contrast to all previous reviewers, the results of this 
analysis suggest that training In test-taking skills has only a 

very small effect on students' scores on standardized achievement 

« • « 
• ■ . • 

i • 

tests* Longer training programs are more effective, particularly 

C - • 

for students «1n grades K l-3,-:and for students from* Vow .• 
socioeconomic 'status background. Results from previous reviews of 
this body of literature are critiqued and explanations offered as 
to* why the results of the present, investigation are 'somewhat 
contradictory to previous reviewers' conclusions. Suggestions for 

' ' . :•* . ....•>.' 

further research are. given. 4 . .... Up - 
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- • x Teaching Test-Tak ; 1ns Skills to Elementary . 

Grade Students*: A Meta-Analysls *" 
Since the seminal work of Mlllman, TBI shop, and Epel (1965), 

much attention has been directed to'the Influence of test-taking a 

,*'*., ' * , 

skills^ or "test-wlseness," on- scores of achievement tests. ' " . 

Assumptions from the past have Included that testes en ess -1s ** 

substantially' separate variable* not strongly correlated with • 

Intelligence (Diamond & Evans, 1972), that test-taking skills, are 

alterable by training, and .that these skills would transfer to , 

higher scores on. achievement tests (Ford t 1973; Fueyo, 1977; 

Sarnacki , 1979). . . . . ' • . v 

• > 
. - ■ ... 

Training materials have been created (some of which- are 

• * 

commercially available) to teach "test-taking skills" /e.g., Mini'- 
Tests, 1979 and Test-Taking, Skills Kit, 1980), and' claims have 
Been made -that such training leads to Increased test scores (e.g.. 
Fueyo, 1977; Jones I Ligbn, 1981; Samson, 1984). The rationale 
.for such trailing programs stems from the common practice of 



utilizing results from achievement tests to assist iq making * 
decisions about educational placement, programming, and 

0 . 

evaluation. To the'degree that achievement tests are measuring 
test-taking; skills ratfyer .thAn mastery of the content being tested 
(e.g., reading, math), decisions about placement, programming, and 

. * 

evaluation may be Incorrect (see Ebel, 1965,' for additional 

> 

discussion). Promoters of teaching test-taking skills have* 
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i \ - 

claimed ,that students wou-ld obtain h1gher^*cores if defic'leHoles ' 

t * . ...... 

In test-taking skills 0 were remediated, thus resulting to a more 
Valid Indicator of how well the student had mastered the content . 
tie test was designed* to assess. 

• Although efforts tb reduce measurement* error i,n standardised* 
achievement. testing -are commendable, several .questions remain: 
IT Although many peo'ple have concluded that test-takfnci 
skills training leads tb Increased test scores,. is that position 

fcnsistently' supported empirically, and*what is \fk magnitude of 

• »".*•• • . . * • 

typically obtained effects? • * . 

* ' • 

2. Can the cost of typical test-tak.ing training programs be 

justified in view.of the magnitude of observed effects ^arid^the 

»• • • * " *♦ . 

alternative uses 'of the same resource (I.e., is It cost- 

. * ' ' ■ • - 

. effective?)? , ■.*.»•. j> ■ ' • 

■ J ■ j ' • 

3. e Are some types^ of »tr.airying" more valuable than ojthers'in 

increasing performance on achievement tests, 'arfd are some groups 
of children, more likelj^than others to benefit from .such 
training? Th,e purpose of the present investigation was to 

» i • 

integrate the results from previous research to answer' the 

. ' 1. * - 

preceding questions as they pertain to standardised achievement ' • 

tests with elementary school-aged children. . 
Review of Previous Work ^ 

Several reviewers have previously examined the effects of 
teaching test-taking skills' *&aifgert-Drowns, Kulik & Kulik r , 1983; 

• 

i 
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Ford, 1973; Fueyo, 1977; Jones & Ligon,*1981; Sarnacki„ 1979; 
tayior* 1981). A summary' of the characteristics and conclusions 
of these, reviewers 1s shown 1h- Tabpfe 1« * % 



^Insert Table !• about Jjere 



.AIT previous reviewers, concluded that test -taking skills could be 
^ taught effectively and resulted in berjgf ft$%or children ' * 
. (Including higher achievement^ test stores). Unfortunately, except 
for Bangert-Drowns et al. (1983) and Taylor (1981), previous 

* * 

, reviews failed 'to indicate thfe' procedures- or criteria for 4 
Including research studies in tjieir review, did not cite and ' 
critique prior reviews, -and apparently only analyzed results of 

9 

the primary research deluded 1<i their review in terms of the 
original researcher's conclusions. As w.i11.be shown below, all of 
the nfvi ewers. failed to incluae a substantial number of studies 
with elementary aged children. Consequently, one cannot* be 

m 

A * « 4 

confident that results' cited in these reviews are representative 
of' available reseach. It is also difficult to draw conclusions 
v abofit the magnitude of the alleged Effect of tVaini'ng students in 
test-taking skills since most of the reviewers stated -only that 
differences were found, or improvement was noted, and occasionally 

• # 

referred to statistically significant differences between groups. 
^Without knowing more about the magnitude of the effect . * 



* 
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atVlbutable to teaching test-taking skills, it 1s difficult to 
draw conclusions 'about whether 1t Is likely to be a wise •° 
Investment to divert resources, from other activities (e.g., ' 

teaching reading) to teach, test-taking skills. 

• " • ' *' 
Taylor (.1981) conducted an excellent review .on the effects of. 

practHce, to aching-, and- reinforcement on test scores. ,th1s 

.Investigation focused upon atl.age levels* and on group- ■ 

administered as well as individually administered tests. The 

t 

great majority of studies selected, In* fact-, concentrated on 

**«••' «, ' . »' 

either IQ tests ,or n on -elementary age populations; consequently! 'a 

• " - . * . ' ) 

substantial number of studies/ which invest ig'ated\the-ef?ects of 

tracing acHlevement test-takih.g skills with elementa%y-aged 
children were not included 1n her review. 

ThemdSt cempreherisive analysfs tp date of the effect of 4 % 

teaching test-takirrg skills on achievement* test scores, *wa§ a meta- 

' / S '' * ■ v . 

analysis recently completed by Banger t-Drowns et al. (1983). 

• * . , ■ \ * > 

The effect *of leaching test-taking 9kil.ls for elementary-rand . - - 
secondary- aged children, was*, analyzed by competing- a standardized 
mean difference effect sjze for each study (Glass, 1977) to 

indicate ^he extent to which achievement test scores were altered 

* * • « 

by training*' This was a substantial improvement from most. earlier 

'> v - ' * . ' . • ' • ' ' 
reviews which relied primarily on authors' conclusions or tests 'of* 

; ' ' * i ' 

statistical significance without indicating the magnitude of 
effects. Knowing the'lSagnitude of improvement is very important * 
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so that pract1t1one/s can make judgments concerning whether -the 
investment *1n~ training .is 'cost-effective compared .to what else . 
could have, been* accomplished with that time. . Ban ger thrown set 
al. (1983) concluded that teaching test-taking skills raised 
standardized Ik hlevement test scores by .25^ standard deviations-- 

• • 

enough to raise the typical student from the* 50th to the 60th 

percentile. They also m concluded that length of training program 

was positively. relate*. to effect size; drill and practice Was. less 

.effective than training in "broaB cognitive' skills;" and' ' 

•—7 • . . 

effectiveness of training was not affected -by identifiable subject 

characteristics or other characteristics of the program. 

Although Bangert-Drowns et al. provided valuable information, 

\i ' . * * . • 

their study is limited by's'everal factors. First, a number of 

n • fc 

studies have been done which' were hot included In their review* 

Secondly, although indicators .of study quality were coded, there 

was rro report of efforts to determine if there. were differential - 

effects, for studies of high versus low quality. It may be, for 

example, that Investigations of lower quality produce effect sizes 

* * • * 

* • 

which. are substantially different (and also-less credible)' than 

< 

studies of high quality. « • 

Third, their decision to average all outcomes from a given 
study into one measure of effect size can be misleading. For 
example,* Levirie (1980) randomly assigned low SES and not low SES 
fifth graders to either test-taking training or cpntrol groups and 
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collected data on students' scores on standardized reading * 

' achievement and an assessment of "test-wlseness"'. Tour obvious 

\ 

effect si2es are possible: low. SES experimental versus control 

for reading and test-w1s en.es s; and riot low SES 'experimental versus 

control for reading and test-wlseness. CThese four effect sizes 

range from .38 to 1.52 -and average .90. To report only the 

t \ averaged all four is not only misleading, but Irretrievably 

obscures important differences between types of subjects and^tjyes 

of outcome (e.g., In, this study the effects for low S.ES, subjects 
)-.• ■ . * 

were mucj) largeV than "not low/SES" subjects for both outcomes, 

and effects for test-wiseness were much larger. tharf reading 

N '.*•*. • . • • 

- achievement for «both groups). 1 % * 

Finally, in some Instances Ban prt -Drowns et^ai: appear to 

have used inappropriate computations for determining the effect 

s1ze J . For example, in the Romberg, (1978) study, classrooms were 

randomly assigned* to treatments', and' class averages ttere used as 

' the unit of 'analysis^. While the use 6f classroom means as the 

i - • 

unft of analysis 1san appropriate statistical procedure (Peckham, 
, i* . ■% . 

< Glass, & Hopkins, 1969), the standard 3evi a'tio/u of group mearvs 

• ✓ 

will generally be much smaller than the w1 thin-group 'standard 
deviation. The use of the between-class standard deviation will 
result in a much larger effectt^size an* will not be comparable to 
studies for whteh the within-gfoup standard deviation was u'srd. 
In the Romberg study, Bangert-Drown* et al apparently used* the 

\ • • 
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between-class standard deviation for achievement test scores and 
obtained an effect size of .48. By contrast, the present- authors 
estimated the effect size (since within-group standard deviations 
were not reported) by converting the reported percentile scores to 
' Z scores and using differences, in Z scores as the effect size. 
•This procedure yielded an effect size based on the w1 thin-group 
standard deviation of only .14— less than one third the magnitude * 
of Bangert-Drowns et al. estimate. 

Other important questions remain unaddressed by Bangert- 

« 

Drowns et aU (1983). First, many investigations believe that the 
training of test-taking skills is particularly beneficial for 
children in low socioeconomic/ settings (e.g., Jones & Ligon, 1981; 
Jongsma & Warshauer, 1975). Thus, it 1s important to determine 
whether teaching test-taking skills has a differential effect on 
J children of low socioeconomic status than it does on children who 
do not come from such groups. Secondly, it is important to 
determine whether the effects of training in test-taking skills 
are different for children" of different ages. In the Bangert- 
Drowns et al. study, students 1n grades 1 to '6 were combined into 

one category. Third, it is important to replicate their findings 

r 

iut length of training and type of training, and to determine 
whether there are any other important concomitant variables or 
interactions among variables not identified by Bangert -Drowns* et 
al. Finally, it is important , to know whether s'tudies of adequate 



one c 
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validity produce different effect sizes from studies of less than 
adequate validity, and whether there is a differential effect for^ 
different types of dependent measures (e.g., achievement tests, 
measures of test-wiseness, student attitude). 

Proce dure 

■ + 

Location of studies . Several procedures were used to find as 

e 

many studies as possible which investigated the effect on group- 
adminstered standardized achievement test scores of teaching test- 

• ■ 

taking skills to elementary- aged school children. Studies which 

examined attempts to improve, for example, scores on 

individualized achievement tests or IQ tests were excluded from. 

this analysis. Also excluded from analysis were studies which 

investigated the effects of training on achievement test 

performance of students of greater than <6th grade level. . ' 

Studies were located by first conducting a computer-assisted 

i • 

search of Dissertation Abstracts International , Psychological 
Abstracts , and Educational Resources Information Center (ERIC) , 
data bases. Studies found in this way wene examined to determine 
whether they contained references to other appropriate studies. . 
Previous reviews of research on teaching test-taking skills 
(Bangert-Orowns et al., 1983; Ford, 1973; Fueyo, 1977; Jones & 

0 

Ligon, 1981; Sarnackii 1979; Taylor ,4981) were also examined for 
additional studies. Twenty-four experimental studies of the 
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effects of teaching test-taking skills on achievement tests for 
students In grades 1 through 6 were located. This number Is 70% 
greater than the greatest number of studies Involving achievement 
tests for elementary school children found by any previous 

reviewer. / 

/ 

Coding . Each study was coded for 14 different variables 
which described the type of subjects with whom the research was 
conducted, the type of training provided, the experimental design 
used, and the type of outcome data collected. The specific 
variables code^ are reported in Table 2 in the results section. 
Interrater consistency was established by having two independent 
reviewers code each article. Wherever disagreement occurred, 
differences were resolved by discussion. 

To enable the comparison of all outcomes across all studies, 
an effect size for each relevant comparison was computed (Glass, . 
McGaw, & Smith, 1981). Effect size was defined as the mean 
difference between two groups divided by the standard deviation of 
the control group. When means and standard deviations are not •. 
reported, pn a study, effect sizes can also be calculate^ from ° 
other statistics sofl^as £ and £. Basic conventions for 
determining which ef/Btt sizes to code, ar/d methods of calculation, 
when means and standard deviations were^not available, are, given 
in Casto, White, and Taylor (1983). 
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In addition, obtained effect sizes were adjusted using 

Hedges 1 (1981) formula for bias correction of the effect size ' 

estimator before analyses were done. Although the correction 

procedure was used for all results in the present study, the 

authors agree with Ban gert -Drowns et al. that the overall 

difference in effect sizes, due to this correction procedure was 

trivial (only 1 out of :65 effecfsizes*changed by more than .01 of 

» 

an effegt size). 

Jtesults and Discussion 
The A 24 investigations of the effect of teaching test-taking 
skill's resulted in 65 effect sizes which were relatively evenly < 
distributed among studies. The .mean effect size for'all 
comparisons including achievement tests, tests of test-wiseness, 

4 

self-esteem, and anxiety, was .21, a figure which is consistent 
with tljat of Bangert-Drowns 'et al. but shouUlbe interpreted .with 



"caution since U is%he average across dWerelt types of 
dependent measures, studies of differing quality, and students 
with different characteristics/ 

Table 2 shows the mean effect size for all levels of the 
different Variables coded in the meta-analysis. As can be seen, 

''^ : ■ 

% Insert Table 2 about here 

y ; ; 
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the average effect size for studies with adequate, validity Is 
relatively close to that «f studies with tnadequate validity (.20 
vs. .29). Although' this suggests, that it may not be f necessary to 
account for quality of study In Interpreting the Impact of * . 
training students In test-taking skills, further examination of, 
Table 2 shows that this 1s*not the case. In particular, we note 
that the average of M effect sizes for achievement test scores 
from studies of adequate validity is .10, while the average of 6 
effect sizes from- adequate studies measuring "test-wiseness" is 
.71— almost 10 times as large. There are also no measures of ' 
test-wiseness or measures such as anxiety, self-esteem, and 
attitude towards the test, which come from studies with Inadequate 
validity. Thus, the apparent equivalence- in average e f fect sizes 

c 

be twe em's tu dies of adequate validity and inadequate validity is 
^largely attributable to the fact that outcomes other than 
achievement all come from studies of "adequate validity and yield 
substantially higher effect sizes than measures of achievement. 

The mean effect size for, achievement test scores from studies 
of adequate validity Is only .10 compared to an average of .29 for 
achievement test scores for studies with Inadequate validity. 
This contrasts sfnarply with the findings. of Bangert-Drowns et al. 
who reported an average effect size of .25. Part of the reason 
that Bangert-Drowns et al. found a higher average effect size may 
have been that they collapsed several different outcome measures 
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'from the study into one average effect size. As noted above, this 
can be misleading and prevents analyses of important issues. 

Because there is such a dramatic difference in average effect 

* . .' . - 

size between studies with adequate validity and studies with 
inadequate -Validity, and between measures of achievement and other 
measures, the remaining analyses will focus primarily on effect 

« • * 

sizes of achievement tests from studies with adequate validity. 

■» 

The mean effect sizes for achievement test scores from 

a • • ■ • i 

studies wtth adequate validity for different levels of length of 

•» • • ' 

treatment, SES Vevel,. and grade level are shown in Table 3. 



^ 1 -- — 

Insert Table 3 about here \ 

* — « • 

As can be seen, there was considerable difference between 
■■> ' . 

interventions which were less than .4 hours and those which were 4 

or more hours (.04 vs. .29). A similar finding was seen when 

results of achievement test scores were broken down by grade 

level. When treatments were administered to students in the 

primary grades (1-3), the average effect size on standardized 

achievement tests was only .01. From grades 4-6, however, the 

mean effect size for achievement tdsts was much higher, .20; The 

difference between students of differing socioeconomic backgroun 

• was very slight (E? ■ .14 vs. ES ■ .09), with a very small . 

advantage for students from low socioeconomic backgrounds. 
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- Even more Interesting than t^e average effect size for 
different* levels of these three variables are the interactions 
between the -variables. As can be seen in Figure 1, for treatments 
Involving less than 4 hours, students In the primary grades 
exhibited 'slightly negative effect sizes (E5 •■ -.12hwh1.1e 
students from grades 4 through 6 had an average effect size of 
.19, For students receiving more than 4 hours of training, 
however, there Is no difference— students in both grades 1-3 and 
4-6 had an aver age' effect size of .29. Although tye mean effect 
size for students in grade 1-3 with 4 or more hours of treatment 
is based on only four studies, these data are provocative and 
require further investigation. ,Mare specifically, it appears that 
for o)der students, a short amount of training in test-taking 
skills may result in substantial improvement.. However, for 
younger children, it takes much more training before ftiere are 
observable benefits. 

Figure 2 shows mother interesting interaction between length 
of training and socioeconomic status. With less than 4 hours of 
treatment, neither "low SES" nor "not low SES" subjects benefited 
appreciably (average effect sizes are .05 and .08). With high 
levels of treatment, students from low socioeconomic* backgrounds 
benefit more than twice as much as students who are not from low 
socioeconomic backgrounds (average effect size * .44 vs. .20). 
Again, this finding requires further repl 1 tat Ion* before confident 
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conclusion can be drawn, but it suggests that authors who have 
contended tnat .training In test-taking skills most" important 
"for "students from low socfoeconomic background (e.g., Jones & 
Llgon, 1981; Jongsma & Warshauer, 1975) may be correct. , 

fefore drawing conclusions about the efficacy of training 
students in test-taking skills, It is importantyto comment briefly 
on the differences in average effect sizes* between outcomes of 
achievement test scores (K * .10), tests of test-wlseness" 
• (ES * .71), and flieasures^ of anxiety, s^lf -esteem, and attitude 

towards,, tests (ES * .44)/. Admittedly, the measures other than 

/ \ / ' ' 

'scores' on achievement tests are based on a very' limited number,. of 

studies, so one should be cautious in drawing conclusions. 

However, from these data, it appears that tests of test-wlseness 

are more sensitive to training effects. Qne explanation for this 

much larger average effect size is thtto the training program is 

"teaching to the test." The fact that high scores on tests of 

test+wiseness are dfot necessarily related ^o higher achievement^ 

test scores suggests that the relation between test-Wiseness and 

high scores on achievement tests, is not very strong. It should be 

remembered that the primary argument tfor providing training in 

test-taking skills to students has always. been* related to the need 

to reduce^Jeasurement errors in the child's standardized test 

score. To the degree that that is happening, it has been assumed 

that test scores would go up. Although the fact that test scores 
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are not^golng up appreciably Is not proof that scores ai*e not more 
accurate, It still leaves the. burden of proof upon those who claim 
that training In test-taking skills Is beneficial. Higher scores 
on te^ts of test-wiseness are not sufficient evidence fq^thtite* 



< ■ „ 



benefits. 

* 

Conclusions f 
As noted earlier, this Integrative review was designed to 
answer the following three questions: 

1. ^ife what degree is the popular position that training in 
test-taking skills is beneficial for. children supported by v 

empirical evidence? - ' 

s • \ • 

2. Oo the data about the effect of, teaching test-taking \ 

skills justify the use of resources for this purpose' as opposed to 
"alternative uses of'vthe same resource?" 

3. Are some types of trailing too re effective or are some 
groups of children mqrfc likely to benefit from training 11T tes^t- 

m 

.taking skills? » 

In response to the first question, the results, of this review 
stand out in contrast with all previous reviewers of the effects 

» 

of trai,nina 1n< test-taking skills. The most credible evidence 



traynno in< tesi 
iiultVfrom high 



(results'from high quality studies limited to scores on 
standardized achievement tests), at least as It pertains 4o 
elementary school-aged children, does not demonstrate a sizeable 
benefit for teaching "test-taking skills. The reason for these 
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* Si 

different conclusions' is partly attributable to theuse'of more 
' systematic techniques than used by many of thej)rev,1ous reviewers 
to identify the magnitude of the effect and how that effect 
covarled'wlth other variables. More importantly, a larger number * 
of studies was Identified and quality of stjudy^d type of outcome 
was accounted for. 

1 Is training in test-taking skills cost effective?* The answer 

Is notJ clear-cut. Clearly, benefits of a tenth of a standard 

/ 

deviation are relatively small (less than orie month worth of gain 

* • " 

in reading for an average third grader), but they were obtained at 



relatively little cost. Even the longest*tra1n1ng program lasted 
only 20 hours, and, the 'majority of effect sizes came from studies 
in which training lasted less' than 4 hours. The question also 

depends in part on whether one Is talking about children In grades 

.«■'•.. - 

1-3 or grades 4-6. These data suggest that for older children, a* 

» 

limited amount of training can have a discernible effect. Vtr 
younger children, more training Is necessary. Also, the fact that 
few studies (unfortunately, it 1s„a very limited number) suggest 
that training in test-taking skills has some positive' impact on 
ajyciety, "self -esteem, and attitude towards tests should not be 
forgotten.' However, before It Is accepted as fact, more research 

« I 

needs to be done* It Is clear that a comprehensive analysis of 

« 

* 

previous research on training test-taking sic 1 1 1s suggests that the 
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benefits are not nearly so great as has typically been concluded. 
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\ ■ Data from the meta-analysis do suggest that training In test- 

taking skills Is differentially effective for various subgroups of 

children. The Interactions between length of treatment and grade 

level, and length of treatment and SES are particularly 

provocative. and deserve further research. 'In general, the meta- 

analysis supports the conclusion of Barfgert-Drowns et al. that * f 

longer training programs are rn^re effective. As a general 

strategy, it also^appears that training Is more effective in the 

upper elementary grades than in the lower elementary grades. 

» . •• » 

Whether or not a tra^nint" package includes practice*. tests, * 

reinforcement, or drill and practice- does not seem to be an issue 

abouj -which we have sufficient data to draw concJuskns^More 

• • • 

research Is needed before we can decide what types of training are 

r 

" 9 * 

most effective. 

• Should training in test-taking skills be^pursuedT Hopefully, 

9 

the results of this analysis will temper some of the -unfounded 
enthusiasm in support of training children In test-taking skills. 
However, it would be unwise jto conc\ude that training In test- 
taking skills 1s unwarranted or detrimental. Although the effects 
of s^uch training are small, the investment Js, relatively cheap, 
and there Is some evidence that for particular groups of children, 
training in test-taking skills can- have substantial effects. 

* 

Those tentative conclusions need further research, but indicate an 
area worth pursuing. ^ 
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Table 1 



Characteristics and Conclusions of Previous Reviewers of the ' 
Effect of Teaching Test-Taking Skills 



Author/year 


i of exper.i- 
Afntal 
studies 
cited 


Methods for 
selecting 
•studies 

specified? 


Previous 
reviewers 
cited and 
critiqued 


Outcomes of 
experfmental 
studies cited 

in terms of 


Conclusions 
about effec- 
tiveness of 
training test- 
taking skills 


Variables 
cited which 
covary with 

effect of 
train ina 


Type of 
studies 
Included ■ 


Bangert-Orowns 
et al./1983 


30 


Yes 


No 


Standardized 
effect size 


Effective 
IS • .25 

r 

r 


Length of train- 
ing program, 
type of training 


Achievement 
tests; elemen- 
tary and secon- 
dary level 


Ford/1973 


24 


No 


No 


Cone) us fons^ 


Effective 


None 


Achievement, IQ, 
and aptitude 
tests; preschool 
through adult ' 


Fueyo/1977 


19 

*> 


No 

t 


No 


Conclusions 
h 


Effective 


None 


Achievement, IQ, 
and aptitude 
tests; preschool 
through adult 


.lone* & Ligon/ 
1981 


5 


NO 


No 


Conclusions 


Effective 


Maintenance of 

effect 9 
Socioeconomic 

status 


Achievement, IQ, 
and aptitude 
tests; preschool 
thrpugh adult 


Sarnacki/1979 


17 


No 


No 

1 


Conclusions 


Effective 


None 


Achievement, IQ, 
and aptitude 
tests; preschool 
through adult 


Taylor/1981 


34 


Yes 


Yes 


Standardized 
effect size 

— 1 


Effective 
IS" - .62 


Type of training, 
unit of tfdmfnfs-i 
tratfon, quality 
of study, type of 
test (achievement 
vs. IQ) 


Achievement, IQ, 
and aptitude 
tests; preschool 
through adult 
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Table 2 

MeyT Effect Size for All LtveH of All Coded Variables 



l 

9 


Adequate validity 
IS SDg N K 


Inadequate validity 

Ef so K n k 


All studies 


.20 .40 SS 


.29 .33 10 


Total sample size Small (0-75) 
for study: Medium (76-150) 

Large, (150+) 


.32 .28 21 
.11 ' .50 24 
.15 .30 10 


.40 .46 5 
.18 .08 5 


Grade level:* ' 1st -3rd 

4th-6th 


.03 .51 25 
.33 .39 30 


.14 .06 6 

.59 . ,54 3 


Socioeconomic . Low* , 
status level: Not low 


.18 i37 37 * 
.24 .46 18 


.33 .36 8 
.11 .02 2 


Use of reinforcement No 
procedures as part Yes 
of training: 


.22 ..40 48 
-.00 .43 7* 


.29 .33 10 


Hours of training: Less than 1 hr 
' 1 to 3 hrs 

M A ft 

. 4 hrs* 


.09.' .43 ' 14 
.09 .30 22 
.40 .42 19 


.37 .47 5 
.20 " .13 4 


Ust of practice No 
tests as part of Yes 
training: 


.22 .43 42 
.12 .30 13 


.40 .46 5 
.16 .07 _ 4 


Ability level of Nixed ^ 
students: . High ability 

Low ability 


20 52 17 
.09 .21 3 
.31 .12 5 


7Q 11 in 


Type of assignment Random 
to groups: Good matching 

Poor matching 


.27 .39* 40 < 
-.24 .01 2 
*.05 .37 13 


.40 7 
.28 .10 3 


Blinding of data Yes 
collector: No 


.13 .44 34 
*.31 .30 21 


.16. .07 4 
SJ.38 .42 6 


Type of outcome, measure: 
Achievement test 
Test-wiseness test 
Other (anxiety, self-esteem, 

attitude) 


.10 .33 44 

.71 , ,57' 5 

• • 

.44 .36 ^6 


.29 .33 10 

• 



E?'* »nean effect size for a particular group. 

S0e$ > standard deviation of effect size distribution for a 
particular group. 

N£s ■ number of effect sizes on whlch-a computation 1s based. 

Note*. Several other variables Including Percent Hale, Percent 
Handicapped, and Percent Minority were coded to determine whether mean 
effect size covaried wtth ; such subject characteristics. Results for those 
variables are not reported here' because of Infrequent reporting (e.g.; 
Percent Handicapped could only be coded for 2t of the ES's), or lack of 
variance (e.g., 97* of the ES's for Percent Male fell between 47X and 54X). 
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Table 3 • 

Mean Effect Sizes on Achievement Test Scores, Broken Down* 
-by-Treatment Length, SES Level, and^ Grade Level 





Mean B 




n ES. 


♦ 

N Studier 


Less than 4 hours of 
treatment 


.04 


* 

.30 


* 

18 


. 7 


4 or more hours of 
treatment 


.29 


V 

.31 


13 


8 


Low SES • 


.14 


.38 


13 


10 


Not low SES 


.09 


.'31 


31 


13 

* 


Grades 1-3 


• .01 


..37 


22' 


9 


Grades 4-6 


.-20 


.26 


22 


. 9 



♦Achievement test scores*, studies with adequate Validity. only. 
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Figure Captions ^ 

FiqureJ;. Mean effect size by treatment length and "grade level. 

Figure" 2 . Mean effect size treatment length and SES., 

• . ' ? ' * 
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(12 ES's from 
6 studies) 




Low SES 

(5 ES's from 5 studies) 



(6 ES's from 4 studies) 



hot low SES 

(8 ES's fro/n 4 studies) 



r 



I 



1 



Less than 4 hours 4 or more 
of treatment hours of treatment 



.13 



i 



Grades 4-6 



9 ESs frgm 3 studies 




9 ESs from 5 studies 

4* ESs from 3 studies 



Grades 1 - 3 , 



6 £ss from 4 studies 



1 



1 



Less, than 4 hours 4 or more 
of treatment hours of treatment 
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Abstract 

Yifty-elght third graders'' from two elementary school classrooms 

• m 

m w * 

were "assigned at random to test-training and placebo groups. 
Students in the test-training group received six sessions of test- 
wiseness training specifically tailored to the Comprehensive 

Test of Basic Skills. Students in the placebo* group received six 
sessions of creative wrlting exerclses. The effectiveness of this 

A* 

training on achievement tests was obscured due to the presence of 
ceiling effects. Supplementary analyses, however, provided some 
support for the effectiveness of this training. Trained and 
untrained groups were not seen to differ on measures of on-task 
behavior -during the testing situation. 'An analysis of reported ; 
attitudes toward tests taken Immediately after the threerday 
testing period Indicated that (a) the standardized test experience 
was a stressful^pne -for control subjects,' and (b) that the test- 
wiseness training had exerted a significant ameliorating effect irv 
the treatment group. ResulUHndlcated; that test-wlseness 
training may reduce levels of anxiety in elementary school 
children during test s.itual 
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The Effects of Training In Test-Taking Skills on 



. Test Performance, Attitudes, and On-Task 

Behavior of Elementary SchQol Children 

• « 

In recent years, the effectiveness of coaching on achievement 
test performance. has been we&ll studied (Sarnackl, 1979, and Fueyo, 
'*; 1976, for reviews). In a recent meta-analysis, Ban gert- Drowns, 
Kullk, and Kulik (198$ determined that 'coaching for achievement 
tests In the elementary grades produced a generally facultative 
effect (ES » .29) over all studies reviewed. More recently, 
Scruggs, Bennlon, and White (1984) have argued that although 
training In test-taking skills does often produce an effect in the 
elementary school grades, this effect Is dependent upon other 
factors, for example, length of training, age of students, and 
economic level of the students trained. Although researchers in 
the area of test-wlseness training have often looked at variables 
in addition to actual test scores such as performance* on test- 
wiseness tests and self-esteem, they have not addressed the Issue 
of whether or not such training changes in any way the attitudes 
of elementary school children toward tests. This In itself could 
be Sh Important finding for, concerning the degree to which 
4 school-age children are subjected to testing procedures, It would 
be helpful to ensure that such tests, were not unnecessarily 
traumatic. In addition, whether or not training In test-taking 
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skills has a facllltatlve Influence on the level' of effort the 
studenjts put Into^the test situation remains unclear. Such effort^ 
may be evaluated by means of the amount^ erf time on-task students 
put Into the standardized achievement test. 

The present Investigation waf Intended to address some of 

these Issjies^ by providing training in test-taking skills to a 
sample of third grade^students and assessing, In addition to test 

\ 

performance, reported attitudes towards the test-taking 
experience and percent of time actually spent on-task during test 

* • 

administration. Although the effects of test-wiseness training 
have. been we 11 -documented In the past, the present Investigation » 
was intended to "shed some light on peripheral issues and to 



address more specif leal ly> exactly what changes In attention and 
attitude occur as a result of coaching on achievement tests. 
j Method 

Subjects 

Subjects were 58 elementfary-age school children attending the 

/ •* * 

third grade in two different classrooms at a western rural school 

district., Sex was evenly distributed. Subjects were selected at 

random from both classes to participate In treatment and placebo 

groups. 

Materials 

Materials included a manual with six scripted 20- to 30- 
minute lessons in test-taking skills specifically tailored to the 
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Comprehensive Test of Basic Skills, Level E. These materials were 
developed specifically for this project and also Included student 
workbooks for practice activities by the students on the Reading 
Subtest of this test (Williams, 1984). 
Procedure 

Over a two-week period, treatment students were taught six 
lessons In test-taking skills appropriate to the Reading Subtest 
of^ the California Test of Basic Skills. These lessons Included, 
for example; time-using strategies, deductive reading strategies, 
error avoidance s^pateg 1 es , and specific practice activities In 
each of the subtests. To control for possible Hawthorne effects* 
the placebo group was given six exercises In creative writing at 
the same time treatment students were receiving test training. 
Immediately prior and Immediately after training, students In the 
training group were given pre and posttests of test-taxing skills 
to determine whether students had learned from the training ^ 

m 

package. This measure Is shown in Table 1. fiithln three days 



Insert Table 1 about here 



alter the conclusion of training, students were given the 
California Test of Basic Skills by their regular classroom 
teachers in their regular instructional class. During the taking 
of this test, observational measures were taken on on-task 
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behavior of students by four trained observers unaware of group 
memberships of the students being observed. The observers 
employed a time-sampling procedure on an Interval of 30 seconds. 
Each student observed was observed for 30 minutes. On-task 
behavior was computed as percentage of times sampled on-task 
during actual test performance and on-task behavior while 
directions were being given. On-task behavior during directions 
was defined as orientation of student's eyes toward either teacher 
*>r test booklet and pencil -and -paper compliance with accompanying 
sample activities.. On-task during testing was defined as 
student's eyes directed toward test booklet, pencil in hand, 
activlty'marking, reading, or asking teacher direct questions with 
specific reference to the test. After completion of the third and 
final day of testing, students^were given an attitude toward tests 
questionnaire (see Figure 1). This questionnaire consisted of 10 



Insert Figure 1 about here 
items In an agree/disagree format.' Students omp1et£o^the 

V 

questionnaire together while the teacher r-*« terns to the class. 

Results 

i 

Pre and posttest scores of the treatment group on the measure 
of test-taking skills were completed by means *f a correlated 
£ test. On average, students scored 41% correct on the pretest, 
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and 89% on the posttest. These differences were statistically 
significant (t(27) - 13.9, £< .001). 

Mean scores on the Reacting subtest of the CTBS were computed 
and compared statistically by means- of t tests. As can be seen In 
Table 1, none of the group differences are statistically 
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, « 
significant. Interpretation Is not possible, however, ctue to the 
k presence of overwhelming celling effects exhibited on all 
subtests. 

A supplementary analysis was conducted on the lower half of 
each group chosen b^ the previous year's total reading scores and 
Is given In Table 2. This analysis Indicates that standardized. 
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gain scores between second and third grade testing were, 
significantly higher In favor of the treatment group on Word 
Attack Subtest and Total Reading Score. 
On-Task Behavior 

Mean on-task behavior during directions, during testing, and 
total is given in Table 1. As can be seen, no significant group 
differences were found. 4 , ^ * 
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Attitudes Toward Tests 

m 

Reliability of the attitude measure was computed by means of 
a Kuder-Rlchardson 20 formula and was given at .88, Indicating a 
moderately strong degree of internal consistency for a measure of 
this type. Differences between the mean scores 'of the two groups 
were nonsignificant, £Jess than 1 in absolute value. An 
Inspection of Figure 2, however, shows that the distribution bf 
these two groups differs strongly. These figures are most obvious 

Insert Figure 2 about here 

when one employs a curve-smoothing technique of combining the mean 
scores for each of two adjacent frequencies and are given In the 
same figure. The difference between these dispersions was tested 
statistically In two ways: mean differences from the mean In 
standard scores were computed for subjects In each group and 
compared stati tlcally* The mean distance from the mean of the 
placebo group was statistically greater than the average distance 
from the mean In the training group (j> < .01). In addition, a 
Kolmogorov-Smlrnov two-sample test was applied to each half of the 
distribution. For the lower half of each distribution (that 1s, 
students scoring 0 through 5 on the measure), the distributions 
were statistically different (Z ■ 1.529, £< .02), while the upper 
half of each distribution was not seen to differ significantly 
(Z « .756, £ - .617). • 
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Discission 

The present Investigation does not offer conclusive evidence 
that the particular training package employed. significantly 
improved test scores, due to. the celling effects reported In the 

Results section. However, It Is thought that many students did 

* ■ , •/ • 

benefit from this training for the following reasons: (a) 

J — ' , * 

students In the lower half of the treatment group exhibited 
statistically higher gain scores over the previous year's testing 
than did the lower half of the placebo group, (b) students In the 
treatment group scored significantly Higher on a posttest of test- 
taking skills than they had on the pretest before training, and 
Cc) reviews of many studies previously conducted (see Scruggs, 
Bennlon, & White, 1984) Indicate that this type pf training 
generally has facllltatlve effects on test-taking performance. , 
Particularly, this training previously demonstrated a significant 
effect on a subtest similar to the Word Attack subtest In a sample 
of learning disabled and behavioral ly disordered children 
(Scruggs, 1984). 

That achievement test coaching results In greater levels of 
on-task behavior on the part of students was not supported by the 
present Investigation. Student on-task behaviors while listening 
to directions and while taking the test Itself were very similar. 

Analysis of the attitude data did suggest that students In 
the treatment group reported more "normal" attitudes than those in 
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10 

the placebo group. The abnormal distribution df scores In the 
placebo group 1s. highly reminiscent of thabvdf a population under 

m 

stress (see Wilson, .1973). The fact^thjt the abnormally high 
number of very negative attitudes was not present In/ the treatment 
condition while the number of strongly positive attitudes was 
relatively similar suggest^ that this treatment may have 
contributed to more positive attitudes on the .part of (hose 
students who may otherwise have developed strong negative 
reactions to the test and the test-taking situation. It should be 
noted here that completely positive attitudes toward tests was not 

» • 

expected and Is not necessarily a realistic expectation. ^What was 
expected was a roughly normal distribution centering around the 
mean of about 5, which Is 1n fact the/distribution seen In the 
training group. The large proportwn of extreme scores' In the " 
placebo group (with fully two- thirds of the scores within 1 point 
of 0 or 10) Indicates that the population had been subjected to 
some stress and had reported widely polarized views on the test- 
taking process. In the training group, these attitudes seemed t.o 

» 

have been ameliorated substantially. - 
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Table- 1 

T-Tests by Group 



CTBS Reading Subtests 



Ef/ects of Training 

13 



Variable 



N 



Word attack 

Tx 29 

- Cx 29 
Vocabulary 

Tx 29 



C* * 29 
Comprehension 

'Tx • 29 



Cx 29 
Total reading 

Tx 29 

Cx 29 



SD 



29.79 '4.87 

' * .65 

29.72 5.37 

26.31 4.58 

26.90 4.47 
/ 

26.48 4.06 

25. $1 5.21 



82.59 12.35 



82.14 14.04 



-.49 



.79 



.13 



. 2-tall 
prob. 



.959 



/624 



.434 



.898 
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Table 1 (continued) 



"2-tall 

Variable 1 X SD T prob. 



CTBS total battery 

Txj 29 -15(U7 24.68 

Cx 29 154.03 24.10 

Attitude toward test-taking 

Tx 29 5.59 2.97 



Cx '27 5.04 . 3.95 

On-task during directions , 

Tx 18" 45.28 15.78 

# K 

t m 

) 

Cx* 18 5O.,06 21.89, 
On-task during testing 

Tx 18 77.67 16.18 

Cx 18 77.28 14.98 
Total on-task 

Tx 18 65.78 14.76 

Cx 18 67.78 11.82 



-.60 ; N^.549 



.59 ' .557 



-.75 .458 



.07 .941 



-.45 • .656 
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>^abTe 2 . 

Gain Score Differences Between the Lower Half of Each, Group (Chosen 
by* Last Year's Total Reading) 



Jarlable' N X SD Error T > Prob, 

i 

4 

i 

Word attack 

4 

Tx 12 25.83 39.55 11.42 



2.41 .012 



Cx . .14 «20.86 47.06 12..58 

Vocabulary , ' 

• . * ' * 

Tx .12- 18.67 50.77* 14,66 

' • * • .49 .625 

Cx 14 7.93 58.69 15.69 ^ 
f Comprehension 

w Tx .* 12 53.H 37.96 10.96 



1.46 .158 



Cx 14 24.79 57.54 15.38 

Total of al 1 subtests 

Tx , 12 97.67 52.64 15.20 



0. 



. cx / 14 



2.51 .019 



11.86 107.92 28.84 
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Figure Captions 
Figure 1 . Attitude measure. 
Figure 2. Distribution of attitude scores 
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Circle YES or NO. 



rcle^ES c 



1. Taking a test is my favorite thing to do 
at school. 

2. Sometimes I air. nervous when I take a 
test. 

3. I look forward to talking a test. 

4. I dislike taking a test when I don't know 
the answers. 

5. I wish we had fever tests. 

6. Taking a test is always fun< 

>*7. J like tests even when I don't know the 
answers ,. 

8. "taking a test is one of the %*orst things 

about school. 

9. I would rather do something else besides. 
\ take a test. 

10. I wish we had more tests. 
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Abstract 

Ninety-two second, third, and. fourth grade children classified as 
learning disabled (LD) or behavioral ly disordered (BO) were 
randomly assigned to treatment and control groups. Students 
assigned to the treatment condition were taught test-taking skills 
pertinent to reading achievement tests. Students were taught in 
small groups over a two-week period in such strategies as 
attending to appropriate stimuli, marking answers carefully, time 
using, and error avoidance. Following the training procedures, 
students were administered standardized achievement tests in their 
regular classroom assignments. Results indicated that third and 
fourth grades scored significantly higher on the word study skills 

• ♦ 

subtest and descriptively higher In other subtest: of the bUniord 
Achievement Test. Second grade students did. not appear to have 
been affected by the training. The relevance of thfi_traj n 1 n g of 
this test to other tests involving problem-solving strategies is 
discussed. 
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^ .3 
1 Teaching Test-Taking Skills to Learning D,1iabled 
and Behavioral ly Disordered Children 

Successful performance In school 1s to a great extent 
dependent upon the application of effective learning and problem-^ 
solving Strategies on academic tasks. Students are often called 
upon to meet particular format and task demands on academic 
assignments with effective strategies for dealing with these tasks 
and successfully completing them. Much of the failure of learning 
disabled (LD) students In school -related tasks has been attributed 
to a lack of ability In applying such problem-solving strategies 
(Reld & Hresko, 1980). A body of literature has been established 
in recent years which documents difficulties of learning disabled 
students In employing appropriate learning and problem-solving 
strategies In school. Particular deficits have been noted In the 
areas of: (a) attending to the critical components of a task 
(Atkinson & Seuneth, 1973; Hallahan & Reeve, 1980; Hallahan, 
Kauffman,, & Ball, 1973; Ross, 1976; Tarver, Hallahan, Kauffman, & 
Ball, 1976), (b) selecting a strategy appropriate to addressing a 
particular academic task (Mastroplerl, Scruggs, & Levin, in press; 
Torgesen, 1977;*Torgesen & Goldman, 1977), ajid (c) effectively 
employing approprlateVroblem-solvIng strategies (Hallahan, 1975; 
Spring & Capps, 1974; Torgeson, Murphy, & Ivey, 1979). 

Given the above documented deficiencies, It would appear that 
one area of particular difficulty for learning disabled and 
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perhaps other mildly handicapped children would be the problem- 
solving strategies necessary for successful completion of 
standardized achievement tests. These group-administered tests 
tj^lcally expect learners to function Individually In large-group 
situations, effectively emplpy time constraints, and develop and 
employ strategies specifically suited to answering* questions which 
rtjay be ambiguous or to which the answers are often not completely 
known (Haney & Scott, 1980). Some recent research with learning 
disabled students Indicates that these students do, In fact, 
exhibit deficiencies with respect to use of effective strategies 
In standardized test-taking situations. Scruggs and'Llfson (1984) 
administered questions from standardized reading comprehension 
tests to LD and non-LD students without providing the accompanying 
reading passages. Their results Indicated that, although non-LD 

students were able to answer most "reading comprehension" - 

\ • - 

questions without reading the accompanying passages, LD students 
were not able to do this. This Investigation reiterated 
previously made questions concerning what read1ng\$omprehens1on 
/tests actually measure, and also suggested that many LD students . 
may lack some specific test-taking strategies, such as ability to 
effectively employ partial and/or prior knowledge. Drawing upon a 
previous investigation wifh mostly nondisabled children (Scruggs, 
Bennlon, & Llfson, in press), Scruggs, Bennion, and Llfson (1984) 
recently interviewed learning disabled children with respect to 
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the manner In which they had Interpreted and answered reading 
achievement test Items.. Analysis of this strategy reports 
Indicated that (a)*LD students were less likely to select and 
utilize strategies appropriate .to different types of test 
'questions, and (b). LD students were more likely to, be negatively 
influenced by misleading distractors. Such results suggested that 
learning disabled and perhaps other mildly handicapped populations 
ma/ have more difficulty than other students* adapting to specific 
task. and format demands of standardized achievement tests and, 
consequently, resulting scores may be leis valid estimations of 
potential performance than those of other students. Although any 
observed deficit in j'test-taklng strategies" on the part of 
warning disabled children would be expected to be representative 

f of more global problem-solving strategy deficits In school-related 
tasks on the whole, 1t v may be possible that specific training In 
test-taking skills may be particularly beneficial to children 
referred for mild learning and/or behavior problems. Many 
attempts have been previously made U^lmprove achievement test 
scores by coaching in test-taking skills, but the results have * 
been somewhat mixed and have, appeared to affect different 

• populations differentially. For example, Scruggs, Bennion, and 
. White (1984) In a reaent meta-analysis reported that students from 
the lower grade levelsNand students from low economic, backgrounds 
tended to differentidlly benefit from extended training in test- 
taking skills. This finding, although not directly relevant to 
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special education,, does Imply that these students may be taught 
some of the critical skills that they apparently 'lack when ' 
confronted with standardized achievement tests. It 'was the 
purpose of this Investigation to determine whether such ski Us 

4 

could be taught and whether such skills could, 1n fact, Increase 
performance on standardized achievement tests without an 
accompanying Increase In knowledge of the task being assessed.! 

Method 

Subjects -5 

Subjects were 92 second, third, and fourth grade students 
attending, resource rooms or self-contained classes In a large ( 

western school district. Twenty-five students were second 

* .. * 
graders, 37 were third graders, and 31 were* attending fourth grade 

v 

classes. The 68 boys 'and 34 girls had tested at an .average of the 

20th percentile (SO * 9.3) at the previous year's testing in 

reading. Thirty-nine students were classified as LD, and 54 • 
students-were classified as behaviorally disordered according to 
•Public Law 94-142 and local school district criteria (for learning 
disabilities, this included a 40X discrepancy between -ability and 
achievement). Twenty-two students were enrolled Iri self-contained 
classes, a/id 70 students were attending resource rooms. 
Materials 

Materials were developed as part of«^ larger project 
involving improving test-taking skills of LD and BO elementary 
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■. \ 

students (Taylor & Scruggs, 1983) and consisted of eight scripted 
lessons for each grade level In a direct Instruction format and 
accompanying workbooks for students which Included pencll-and- 
paper practice activities/ (Scruggs i^lljams, 1984). The generSl 
test-taking strategies taught In these materials Included 
attending, marking answers carefully, choosing the best answer 

m 4 

carefully, error avoidance strategies, and appropriate situations 

for soliciting teacher. at tent Ion. In addition, 'specific test- 

l 

taking strategies were taught for each specific reading subtest 

relevant to reading In the Stanford Achievement; Test* These 

included structured practfce In specific test formats for each 

subtest and specific application of general test-taking strategies 

to each specific subtest. For. example, with respect to the 

* • 
letter-sound subtest, students were taught to employ the following 

sequence of strategies: 

1. Look at the first word; read It. 

2i Pronounce to yourself and think* of the sound of the 
underlined letter. 

3. Carefully look at the answer choices and choose the word 
with the same sound as the underlined letter. 

4. If you don't know all the words, read the words you do 
know, or read parts of Individual words that you may know. 

5. If you are not sure. of the answer, see if there are some 
answers that you are sur are not correct, and eliminate those. 
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6. Color 1n the answer quick, dark, and Inside the. line. 

7. Never skip an answer. 
Procedure ■ - 

Experimental subjects were taught 1n stoall groups ranging 

i * 

t from one to five 1n sl2e and were taught four 20-m1nute lessons 
per week, for two weeks. Positive responding and attention to 
task was reinforced with stickers. Immediately prior t^ the 
training sessions, and immediately after the last training 
session, students were administered a criterion test of the skills 
which were taught (see Figure 1). This test- was a 10-1 tern test of 
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test-taking skills including questions about time using, question 
asking, and elimination strategies. The first seven sessions 
taught the use of test-taking strategies within the specific 
context of each of the reading-related subtests. The last session, 
consisted of a general review of all previous procedures. Each 
day of instruction Involved extensive work with practice 
activities applied to practice test items. At no time during this 
training procedure weje subjects taught any information "concerning 
the content of the te it which was not given in the published 
test directions. Witiin five days of the training procedure, 
students were administered as a group the Stanford Achievement 
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Test. This administration was done In theregular or self- 
contained classroom settings by their regularly assigned teacher. 
Although teachers were aware of the membership of each student in 
the experimental group, response protocols were scored by 
machine. K . 

s Results 
Pre and posttests of the experimental students on the 
criterion measure were compared statistically by means of a 
correlated .t test. It was found that the performance on the 
posttest was significantly higher than pretest scores (£< .01). 
a Students scored an average of 40X percent correct on the pretest, 

and 77% correct on the -post test. 

summary of analyses are^lven In, Table 1. Data for second 
grade students were analyzed separately because (a) sufficient* 
test data from previous years 1 testing existed to compute analysis 

o 

of covariance, and (b) patterns of effects of treatment appeared 
to be somewhat different In this group. Although second grade- 
subjects were asslgled at random to experimental and control 

ereaSigr 



. ■ / 

groups, they dif feredN*ignif1cantly (j) < .05) with respect to 

j ' ■ — ' ' 
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previous years' testing, and, therefore, analyses must be 
interpreted with caution. Although raw scores on reading subtests 
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in 4<a€t favored the control group; these differences were 
decreased substantially by the use of previous year*' testing as a, 
covarlate. In spite of this adjustment with the covarlate, the 
I second grade control group apparently statistically outperformed 
the treatment group In the comprehension subtest. Since the 
groups did differ significantly In^the year's previous testing, 
however, and since a similar comprehension subtest was not a part 

of the first grade test battery, which likely weakened the 

t 

covarlate, this finding appeared to be an artifact of selection 
.bias. Third and fourth grade data were also analyzed separately. 
However, since In the third and fourth grade students, over one- 
third of the total sample were missing previous years' test 
scores, It was not possible to use previous years' testing as a 
• covarlate. As can be seen In Table 1, differences generally ' 
favored treatment groups although none of the initial findings 
were significant to the .05 level. However, -the treatment effect 
was replicated over third and fourth grade groups with a 
particular effect seen In the Word Study subtest raw scores. 
Effect sizes were .63 in the third grade students, and .48 with 
the fourth grade students, both in favor of the treatment group. 
An evaluation of third and fourth grade combined using scale 
scores, however, indicates a significant treatment effect for the 
experimental students on the Word Study Skills subtest. Although 
comprehension scores and total readigg scores also favor the 
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treatment group, these differences are not statistically 
significant. 

Discussion 

The analysis of pre and posttest scores Indicated that test- 
taking skills could be successfully taught to this sample of 
second, third, and fourth grade learning disabled and behavioral ly 
disordered children. The fact that significant gains were made in 
these critical skills ^ulicites that learning disabled and 
behaviorally disordered children at this age^level do, In fact, 
lack certain test-taking skills which are potentially helpful in 
taking standardized achievement tests. 

An analysis of the data apparently suggests that second grade 

students did not benefit from the training package. These data . 

are difficult to Interpret accurately, however, considering the 

fact that this group of children had scored significantly lower 

than the control group on tests administered one year previous. , 

f ■ 
Although the use of analysis of covar lance somewhat compensated 

for these differences, any Interpretation of the results must be 

made with caution considering such significant differences existed 

between the two groups In the first place. However, considering 

these were reading tests and that the average reading performance 

of second grade learning disabled and behaviorally disordered 

children is extremely Vow. it may be that second grade special 

education students lack sufficient reading skills in order to make 
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the most of training In test-taking skills. This may Indicate 

i 

that It Is more prudent to wait until certain critical reading 
skll^ have been mastered before training of this nature will be 
beneficial. Considering the previous differences between the 
experimental andocontrol group with respect to the second grade 
population, however, this Interpretation cannot be made 
conclusively. Analysis of the third and fourth grade data 
Indicated that training In test-taking skills did significantly 
increase scores on the Word Study Sk1 lis subtest of the Stanford J 
Achievement Test for third and fourth grade 1 earn ing, disabled and' 

behavioral ly disordered students. Differences favoring the ! 

*~ " - ■ ' 

treatment group were also found in all the subtests and total / 
reading score, although these differences were not significant, j 
The fact that the Word Study Skills subtest was Increased \ 
significantly may be a function of the fact that this particular 
subtest involves yrtany format changes over a s%>rt period of tlijie 
and thus, was'i^pe amenable to Increased performance through f 
guided practice and feedback on successful skills necessary for 
completion of the subtest (Bennlon, Scruggs, & Llfson, 1984). 
Since previous research has Indicated that learning disabled 
children are more likely to have difficulty with formats on jthls 
type of subtest (Scruggs, Bennlon, & Lifson, 1984), this seetas a 
likely explanation foF the fact that Word Study Skills performance 
■•was significantly facilitated. The degree of facilitation of this 

t 
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subtest In scale score points apparently compares to a gain of 
three academic months for the average student receiving this 

treatment. This gain Is consistent with the findings of a recent 

• .. , • 

meta-analysis (Scruggs, Bennlon, & White, 1984) whlcff Indicated 

that other students tended to gain approximately two 1 to three 

months In situations Involving extended training on test-taking 

skills. Although a three-month, gain does not seem particularly 

large, It must be weighed against the finding that this was 

accomplished In eight relatively "short lessons over a two-week 

period and that training In reading skills over the same period 

would be unlikely to produce such a gain. However, any gain at 

< 

all which is not the result of training, In the associated content 

areas Indicates the possibility that some of the error variance In 
• • • 

this test Is being eliminated and, In fact, Table 1 Indicates 

descriptively that standard deviations were consistently lower In* 

treatment groups than;-control groups. This finding Is not 

♦ 

conclusive but does suggest that error was reduced on the part of 
treatment children. 

w 

Overall, the findings Indicate that critical test-taking 
skills can be taught to learning disabled and behavioral ly 

* * 

disordered second, third, and fourth grade children and that tffese 
skills tend to raise these students' performance on standardized 

s 

achievement tests. 
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*The usefulness of standardized achievement tests In special 
education has been, and remains, a controversial issue (Salvia & 
Ysseldyke, 1979), which Is not Intended to be addressed by the 
results of the present Investigation. This Investigation was 
undertaken to determine whether the problem-solving strategies of 
the type needed for the successful completion of" achievement tests 
could be trained. An additional assumption was that reduction of 
possible measurement error, on any assessment Instrument In common 
use, Is desirable. 
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Table 1 

Test Score Data 



2nd Grade - Analysis of Coyarlance 



Variable 


N 


- 

X 


• 

SD 


^ 

Adj. Mean 


F 


Prob. 


Wo^rd reading t 








4 

1 






Tx 


12 


15.58 


4.32 


17.00 


1.01 


.326 


Cx 


13 


20.77 


7.65 


19.41 






Comprehension 














Tx 


12 


16.42 


6.35 


18(51 - 


5.10 


.035 


Cx • 


13 


26.18 


9.00 


r 24.08 






Word study 














Tx 


12 


25.67 


5.69 


29.44 


.47 


.50 


Cx 


13 


31.62 


10.05 


27.49 






Total reading 














Tx 


12 


57.67 


14.34 


63.01 


2.58 


.124 


Cx 


13 


78.38 


. 22.60 


72.93 
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Table 1 (continued) 
3rd Grade 



2-tall 



Variable 


N 


I 


SO 


T 


prob. 


Comprehension 












raw scores 








• 




Tx 
Cx 


18 
19 


24.61 
25.79 


7.59 
11.98 


-.36 


.725 

7 


Word study 












raw scores 




- 








Tx 


17 


29.12 


8.09 


1.70 


.099 


Cx 


19 


24.95 


6.65 






Total reading 












raw scores 












Tx 


18 


52.06 


16.21 


f 


.813 


Cx 


19 


50.74 


17.33 






Total battery 












scaled scores 












Tx .. 


17 


564.00 


17.80 


.00 


.999 


' Cx 


19 

• 


564.00 


21.09 
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4th Grade 



Variable 



N 



Comprehension 

raw scores 

Tx 17 
Cx 14 

Word study 

raw scores 

Tx 17 
Cx 14 

» 

Total reading 

raw scores 

Tx 17 
Cx 14 

Total battery 

scaled scores 

Tx 17 
Cx 14 



SD 



17.71 

« 

15.79 



26.53 
2L93 



572.35 
572.90 



7.50 
9.96 



26.15 
20.60 



;61 



10.12 1.28 
9.68 



44.24 16.54 1.05 
37.71 .18.02 • 



.04 



2-ta1! 
prob. 



}545 



.209 



.303 



.968 



v 
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3rd and 4th Grades Combined 



Variable 



N 



Comprehension 
scaled scores 

Tx 

Cx 

Word study 
scaled scores 

Tx 

Cx 

Total battery 
scaled scores 

Tx 

Cx 



15 
32 



#34 
33 



34 

33 



568.00 



Standard 2-tail 
SD error T df prob. 



559.00 30.58 5.17 .41 65 .680 
> ■ 

556.00 38.77 6.85 



578.00 31.66 5.43 2.26 65 .027* 
562.00 28.04 4.88 



22.43 3.85 .15 65 .883 
567.00 20.95 3.65 
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Figure Caption 



Figure !. P re-post test. 
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1. When I don't understand the teacher, 
0 I go up to the teacher. 
0 I raise my hand. 
0 I ask another student. 

2. When I mark outside the answer bubble, 
0 I mark It carefully. 
0 I can not erase and fix It. 

■'Q' h I might get the answer wrong. 

3. After I read the test question, 
0 I read all the answer choices. 
0 I think and choose the best answer. 
0 I guess the best answer.' 

4. A vocabulary test asks 
0 the meaning of a word. 
0 how to read a word. 

\ 0 how to spell a word. 

5. the stop sign tells me to 
0 stop and then go on. 
0 stop and check my work. 

0 stop and lay my pencil down. 

6. When I can't read all the words in the answer choices, 
0 I read the words I know first. 

0 I guess the answer first. 
0 I go on to the next question. 

7. When f don't know the answer, 
0 I skip the question. 

0 I guess the best answer. 

0 I raise my hand. r 

8. When I take a comprehension test, © 
. 0 I read the answer choices first. 

0 I read the questions first. 
'0 I read the passage first. 

9. When I take a syllables test, I look 
0 for a compound word. 

0 for a word that has a prefix 

0 for a word that is divided the right way. 

* 

10. The letter-sound in a letter-sounds test 
0 can be spelled by different letters. 
* 0 are always in the middle of the word. 
0 are always spelled with the same letters. 
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